Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50199

Failed pipeline jobs stuck running after incorrect resume

    XMLWordPrintable

    Details

    • Sprint:
      Pipeline - April 2018
    • Similar Issues:
    • Released As:
      Pipeline Job 2.25

      Description

      Setup

      Jenkins v2.89.4 LTS
      Pipeline API Plugin: 2.26
      Pipeline Nodes and Processes Plugin: 2.19
      Durable Task Plugin: 1.18
      Pipeline Job Plugin: 2.17
      Pipeline Shared Groovy Libraries Plugin: 2.9
      Pipeline Supporting APIs Plugin: 2.18
      Pipeline Groovy Plugin: 2.45
      Script Security Plugin: 1.41

      Pipeline Default Speed/Durability Level: Performance-Optimized
      "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs

      Problem

      I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

      I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.

      Research

      I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

      Resume disabled by user, switching to high-performance, low-durability mode.

      At the end of the of the log I saw the following:

      Finished: FAILURE
      Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart

      Why Resume Build?

      The build failed on Mar 12, 2018 6:02:37 PM.
      Why did the build try to resume almost a day later? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

      _class "hudson.model.OneOffExecutor"
      id "41"
      keepLog false
      number 41
      queueId 7178
      result "FAILURE"
      timestamp 1520877757466

      I checked the Java process on the server and it was last restarted on March 02 2018.

      What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?

      Why does this get the build stuck in a "running" state when it's not running?

      Scope

      This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.

      Logs

      I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

       

      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
      
       Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
      
       Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
      
       Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      

       

      Thread Dump

      I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".

      "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at sun.misc.Unsafe.park(Native Method)
       - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
       at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at sun.misc.Unsafe.park(Native Method)
       - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
       at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      

      Workaround

      Running the Jenkins script below will set the running build to aborted to work around the issue.

      Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));

        Attachments

        1. 867.tar.gz
          4 kB
        2. build.log
          0.4 kB
        3. build.xml
          10 kB
        4. build.xml
          1 kB
        5. flownode-84.xml
          21 kB
        6. flowNodeStore.xml
          4 kB
        7. workflow-cps.hpi
          548 kB
        8. workflow-cps.hpi
          540 kB
        9. workflow-cps.hpi
          540 kB
        10. workflow-fallback-flowNodeStore.xml
          7 kB
        11. workflow-job.hpi
          112 kB
        12. workflow-job.hpi
          110 kB
        13. workflow-job.hpi
          110 kB
        14. workflows.xml
          5 kB

          Issue Links

            Activity

            Show
            johnar John Arnold added a comment - Devin Nusbaum See  https://issues.jenkins-ci.org/secure/attachment/43981/867.tar.gz   for a build folder.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            I just released version 2.25 of the Pipeline Job plugin which should fix the issue where completed builds are resuming after restarting Jenkins. If anyone is still seeing the issue after upgrading the plugin, please comment with any additional information you can provide, such as the build folder of the build in question.

            Note that there are other issues that may appear similar but seem to have distinct causes. For anyone newly commenting on this issue, if you are seeing stuck executors without ever restarting Jenkins, please check if JENKINS-45571JENKINS-53223, or JENKINS-51568 are closer to your situation and if so, comment there instead. Thanks!

            Show
            dnusbaum Devin Nusbaum added a comment - I just released version 2.25 of the Pipeline Job plugin which should fix the issue where completed builds are resuming after restarting Jenkins. If anyone is still seeing the issue after upgrading the plugin, please comment with any additional information you can provide, such as the build folder of the build in question. Note that there are other issues that may appear similar but seem to have distinct causes. For anyone newly commenting on this issue, if you are seeing stuck executors without ever restarting Jenkins, please check if  JENKINS-45571 ,  JENKINS-53223 , or  JENKINS-51568 are closer to your situation and if so, comment there instead. Thanks!
            Hide
            keith_ball Keith Ball added a comment -

            I just tried to update the plugin but received a FileNotFoundException: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi

            Failed to download from http://updates.jenkins-ci.org/download/plugins/workflow-job/2.25/workflow-job.hpi (redirected to: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi)

            Did someone forget to publish the latest version?

            Show
            keith_ball Keith Ball added a comment - I just tried to update the plugin but received a FileNotFoundException: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi Failed to download from http://updates.jenkins-ci.org/download/plugins/workflow-job/2.25/workflow-job.hpi (redirected to: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi) Did someone forget to publish the latest version?
            Hide
            keith_ball Keith Ball added a comment -

            Forget it.  it's there now

            Show
            keith_ball Keith Ball added a comment - Forget it.  it's there now
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Keith Ball There is often a slight time delay between when a new release is visible on the update center and when it is available on all mirrors, my guess is that was what happened for you.

            Show
            dnusbaum Devin Nusbaum added a comment - Keith Ball There is often a slight time delay between when a new release is visible on the update center and when it is available on all mirrors, my guess is that was what happened for you.

              People

              • Assignee:
                dnusbaum Devin Nusbaum
                Reporter:
                mkozell Mike Kozell
              • Votes:
                5 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: