Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37730

DurableTaskStep.Execution hanging after process is dead

    Details

    • Similar Issues:

      Description

      Found a case where a sh step ceased to produce more output in the middle of a command, for no apparent reason, and the build did not respond to normal abort. The virtual thread dump said

      Thread #80
      	at DSL.sh(completed process (code -1) in /...@tmp/durable-... on ... (pid: ...))
      	at ...
      

      But there is no active CPS VM thread, and nothing visibly happening on the agent, and all Timer threads are idle. So it seems that a call to check would have caused the step to fail—but perhaps none came?

      Possibly stop should do its own check for a non-null Controller.exitStatus and immediately fail in such a case (but we run the risk of delivering doubled-up events if check does run later); or synchronously call check (though this runs the risk of having two such calls run simultaneously—it is not thread safe); or somehow reschedule it (same problem).

      At a minimum, the virtual thread dump should indicate what the current recurrencePeriod is. And the calls to schedule could save their ScheduledFuture results in a transient field, so we can check cancelled and done flags. Such diagnostics might make it clearer next time what actually happened.

      Also a term claimed to be terminating the sh step, but the build still did not finish. Again nothing in the physical thread dumps, and virtual thread dump still claims to be inside sh. System log showed

      ... WARNING org.jenkinsci.plugins.workflow.cps.CpsStepContext onFailure
      already completed CpsStepContext[186]:Owner[...]
      java.lang.IllegalStateException: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:325)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$5.onSuccess(WorkflowRun.java:300)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$5.onSuccess(WorkflowRun.java:296)
      	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150)
      	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
      	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
      	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
      	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
      	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:702)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:689)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:626)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.doTerm(WorkflowRun.java:295)
      	at ...
      

      So the program state seems to be somehow inconsistent as well; perhaps sh did complete (it is not shown as in progress in flowGraphTable).

      Seems that the virtual thread dump needs some kind of fix TBD to better report the real state of a problematic program.

        Attachments

          Issue Links

            Activity

            jglick Jesse Glick created issue -
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Epic Link JENKINS-35399 [ 171192 ]
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ] Kohsuke Kawaguchi [ kohsuke ]
            abayer Andrew Bayer made changes -
            Component/s pipeline [ 21692 ]
            abayer Andrew Bayer made changes -
            Component/s workflow-plugin [ 18820 ]
            recampbell Ryan Campbell made changes -
            Labels robustness pipeline-hangs robustness
            recampbell Ryan Campbell made changes -
            Priority Minor [ 4 ] Critical [ 2 ]
            jglick Jesse Glick made changes -
            Component/s workflow-durable-task-step-plugin [ 21715 ]
            Component/s pipeline [ 21692 ]
            jglick Jesse Glick made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-38769 [ JENKINS-38769 ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 23 (Web Link)" [ 15169 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            jglick Jesse Glick made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            jonasschneider Jonas Schneider made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            jglick Jesse Glick made changes -
            Assignee Kohsuke Kawaguchi [ kohsuke ]
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            jglick Jesse Glick made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal OSS-1783 (Web Link)" [ 18576 ]

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                jglick Jesse Glick
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: