Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53775

FileNotFoundException for program.dat when running a Pipeline Job concurrently with the Job DSL plugin

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      I unfortunately don't have an easy reproduer, but this bug happens to me fairly regularly (at least a few times a month). For this failure to occur:

      • A Freestyle job is running and performing a "Process Job DSLs" step. Part of this involves updating an existing Pipeline job.
      • That existing Pipeline job is also running (or starting) at around the same time as the "Process Job DLSs" step is trying to update it.

      When the timing is just right, the Pipeline job fails with:

      java.io.FileNotFoundException: /var/jenkins_home/jobs/devops-gate/jobs/projects/jobs/dx4linux/jobs/delphix-build-and-snapshots/jobs/ami-snapshots/builds/1173/program.dat (No such file or directory)
      	at java.io.FileInputStream.open0(Native Method)
      	at java.io.FileInputStream.open(FileInputStream.java:195)
      	at java.io.FileInputStream.<init>(FileInputStream.java:138)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.openStreamAt(RiverReader.java:188)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.restorePickles(RiverReader.java:136)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.loadProgramAsync(CpsFlowExecution.java:773)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:739)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:875)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:745)
      	at hudson.model.RunMap.retrieve(RunMap.java:225)
      	at hudson.model.RunMap.retrieve(RunMap.java:57)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:499)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:481)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:379)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:234)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:105)
      	at hudson.model.Run.fromExternalizableId(Run.java:2436)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.getRawBuild(RunWrapper.java:71)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.build(RunWrapper.java:75)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.setResult(RunWrapper.java:87)
      	at sun.reflect.GeneratedMethodAccessor820.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
      	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
      	at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:2725)
      	at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:3770)
      	at org.codehaus.groovy.runtime.InvokerHelper.setProperty(InvokerHelper.java:201)
      	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.setProperty(ScriptBytecodeAdapter.java:484)
      	at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:347)
      	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onSetProperty(GroovyInterceptor.java:84)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onSetProperty(SandboxInterceptor.java:197)
      	at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:344)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedSetProperty(Checker.java:351)
      	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.setProperty(SandboxInvoker.java:33)
      	at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawSet(PropertyAccessBlock.java:24)
      	at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.set(PropertyishBlock.java:88)
      	at com.cloudbees.groovy.cps.impl.AssignmentBlock$ContinuationImpl.assignAndDone(AssignmentBlock.java:70)
      	at sun.reflect.GeneratedMethodAccessor706.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      Caused: java.io.IOException: Failed to load build state
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      [Pipeline] stage
      [Pipeline] { (Cloning repos)
      [Pipeline] node
      19:23:07 Running on jenkins-agent3 in /var/tmp/jenkins_slaves/jenkins-ops/workspace/devops-gate/projects/dx4linux/delphix-build-and-snapshots/ami-snapshots
      [Pipeline] {
      [Pipeline] checkout
      19:23:07 Wiping out workspace first.
      19:23:08 Cloning the remote Git repository
      [...]
      [Pipeline] // timestamps
      [Pipeline] End of Pipeline
      Finished: SUCCESS
      

      Several things are unusual:

      • The job continues running, even after the Pipeline plugin printed "Finished: FAILURE."
      • If the job was running from a parent job, the parent job shows the job as failed (even though the child job is still running).
      • The job gets to the finally block of my pipeline and sends me a notification that it failed, however, the value of currentBuild.result is paradoxically SUCCESS, which leads me to believe that somehow the pipeline got to my finally block but not my catch block (which sets currentBuild.result to FAILURE on any exception). While the details of this behavior might be specific to my pipeline, the general behavior is very wonky.

        Attachments

          Activity

          Hide
          basil Basil Crow added a comment -

          Sam Van Oort, does this seem similar to other issues you've investigated where jobs mysteriously resume after they've "completed"?

          Show
          basil Basil Crow added a comment - Sam Van Oort , does this seem similar to other issues you've investigated where jobs mysteriously resume after they've "completed"?
          Hide
          basil Basil Crow added a comment -

          Hit this again today. This has become an annoyance for us, since when we push a change to a Job DSL job, we frequently see this behavior on any running instances of that job.

          Show
          basil Basil Crow added a comment - Hit this again today. This has become an annoyance for us, since when we push a change to a Job DSL job, we frequently see this behavior on any running instances of that job.
          Hide
          svanoort Sam Van Oort added a comment -

          Jose Blas Camacho Taboada Could I persuade you to please take a look at this? It looks like it should be a synchronization bug around pickle loading. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation, but I strongly suspect that a threading issue here will be causing other failures that people just don't know the cause of. Thanks!

          Show
          svanoort Sam Van Oort added a comment - Jose Blas Camacho Taboada Could I persuade you to please take a look at this? It looks like it should be a synchronization bug around pickle loading. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation, but I strongly suspect that a threading issue here will be causing other failures that people just don't know the cause of. Thanks!
          Hide
          svanoort Sam Van Oort added a comment -

          Jose Blas Camacho Taboada Could you please provide some details? There should not be a race condition possible (if so, that's a bugfix we need to do). It sounds like you have what you need to fix that, from what you're saying. Thanks!

          Show
          svanoort Sam Van Oort added a comment - Jose Blas Camacho Taboada Could you please provide some details? There should not be a race condition possible (if so, that's a bugfix we need to do). It sounds like you have what you need to fix that, from what you're saying. Thanks!
          Hide
          therealwaldo Will Freeman added a comment -

          Also seeing this error pop up fairly consistently when a number of PR builds all fire at the same time on a multibranch job.  After the error, the jobs all stick around until the master is rebooted.

          Show
          therealwaldo Will Freeman added a comment - Also seeing this error pop up fairly consistently when a number of PR builds all fire at the same time on a multibranch job.  After the error, the jobs all stick around until the master is rebooted.
          Hide
          basil Basil Crow added a comment -

          Hit this again today.

          I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation

          Sam Van Oort, can you explain why you consider this an edge case situation? Is it because you'd rather that users use a Jenkinsfile with the Pipeline Multibranch Plugin rather than Job DSL? I'd love to do that, but I can't. My organization centralizes company-wide build logic in one team, and one repository within that team. Due to JENKINS-43749, the Pipeline Multibranch Plugin doesn't work for my use case. In that bug Jesse recommended the use of Job DSL. So that is what I am doing. Is this an edge case situation? I'd be happy to move to a different setup if one was available, but I don't see any other options, I'm afraid. And this bug continues to affect us.

          Show
          basil Basil Crow added a comment - Hit this again today. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation Sam Van Oort , can you explain why you consider this an edge case situation? Is it because you'd rather that users use a Jenkinsfile with the Pipeline Multibranch Plugin rather than Job DSL? I'd love to do that, but I can't. My organization centralizes company-wide build logic in one team, and one repository within that team. Due to JENKINS-43749 , the Pipeline Multibranch Plugin doesn't work for my use case. In that bug Jesse recommended the use of Job DSL. So that is what I am doing. Is this an edge case situation? I'd be happy to move to a different setup if one was available, but I don't see any other options, I'm afraid. And this bug continues to affect us.
          Hide
          basil Basil Crow added a comment -

          Hit this again today.

          Show
          basil Basil Crow added a comment - Hit this again today.
          Hide
          mbochenk Nikita Bochenko added a comment - - edited

          Constantly running into the same issue. Using both Job DSL+Pipeline and Pipeline directly to generate new jobs or update their configuration. Have no idea how to reproduce it as it seems very random.

          It seems if I reload configuration from disk, the error goes away for some time.

          Show
          mbochenk Nikita Bochenko added a comment - - edited Constantly running into the same issue. Using both Job DSL+Pipeline and Pipeline directly to generate new jobs or update their configuration. Have no idea how to reproduce it as it seems very random. It seems if I reload configuration from disk, the error goes away for some time.

            People

            • Assignee:
              Unassigned
              Reporter:
              basil Basil Crow
            • Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: