Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37720

Virtual thread dump hangs waiting for ProcessLiveness

    Details

    • Similar Issues:

      Description

      "Handling GET /job/.../.../threadDump/ from ... : RequestHandlerThread[#9] CpsThreadDumpAction/index.jelly / waiting for hudson.remoting.Channel@..." Id=... Group=main TIMED_WAITING on hudson.remoting.UserRequest@...
      	at java.lang.Object.wait(Native Method)
      	-  waiting on hudson.remoting.UserRequest@...
      	at hudson.remoting.Request.call(Request.java:147)
      	at hudson.remoting.Channel.call(Channel.java:780)
      	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:249)
      	at com.sun.proxy.$Proxy77.join(Unknown Source)
      	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:991)
      	at hudson.Launcher$ProcStarter.join(Launcher.java:388)
      	at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:87)
      	at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:59)
      	at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:188)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.getDiagnostics(FileMonitoringTask.java:224)
      	at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.getDiagnostics(BourneShellScript.java:204)
      	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getStatus(DurableTaskStep.java:221)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadDump$ThreadInfo.<init>(CpsThreadDump.java:54)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadDump$ThreadInfo.<init>(CpsThreadDump.java:32)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadDump.from(CpsThreadDump.java:148)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.getThreadDump(CpsThreadGroup.java:435)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getThreadDump(CpsFlowExecution.java:756)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadDumpAction.getThreadDump(CpsThreadDumpAction.java:57)
      

      Probably same ultimate cause as JENKINS-37719, a failure in the Docker daemon. In this case the build ended with a completed shell step and then

      Resuming build at Fri Aug 26 15:01:29 UTC 2016 after Jenkins restart
      Waiting to resume Unknown Pipeline node step: ???
      Ready to run at Fri Aug 26 15:01:36 UTC 2016
      

      (PipelineThreadDump for the support bundle was likewise blocked.)

      As in JENKINS-37719, using join() with no timeout is inappropriate.

      Not obvious why the build did not respond to anything less than a hard kill. Initially there was no CPS VM thread for it. Later (after trying to escalate kills) got

      "Running CpsFlowExecution[Owner[.../...:... #...]] / waiting for hudson.remoting.Channel@..." Id=... Group=main TIMED_WAITING on hudson.remoting.UserRequest@...
      	at java.lang.Object.wait(Native Method)
      	-  waiting on hudson.remoting.UserRequest@...
      	at hudson.remoting.Request.call(Request.java:147)
      	at hudson.remoting.Channel.call(Channel.java:780)
      	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:249)
      	at com.sun.proxy.$Proxy77.join(Unknown Source)
      	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:991)
      	at hudson.Launcher$ProcStarter.join(Launcher.java:388)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Decorator$1.kill(WithContainerStep.java:237)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.stop(FileMonitoringTask.java:167)
      	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.stop(DurableTaskStep.java:211)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:835)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:829)
      	at ...
      

      which is very similar to JENKINS-37719, except that here the join is directly from WithContainerStep, rather than via DockerClient.

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            workflow-cps PR 81 should have addressed the symptom of the virtual thread dump hanging, though a timeout on join is still desirable.

            Show
            jglick Jesse Glick added a comment - workflow-cps PR 81 should have addressed the symptom of the virtual thread dump hanging, though a timeout on join is still desirable.
            Hide
            jglick Jesse Glick added a comment -

            1.15 did impose a timeout, but anyway JENKINS-47791 obsoletes this.

            Show
            jglick Jesse Glick added a comment - 1.15 did impose a timeout, but anyway  JENKINS-47791 obsoletes this.

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                jglick Jesse Glick
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: