Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47868

Pipeline durability hang when slave node disconnected

    Details

    • Similar Issues:

      Description

      My parallel pipeline job runs primarily on Jenkins slave nodes and I came across a case where a parallel branch went to a slave node that disconnected from the Jenkins master due to an issue with our hosting provider.  This hung the build until I manually stepped in.   I noticed it after all of the other branches completed their work and one branch was running on a disconnected slave.  Even though Jenkins master had many idle Jenkins slave nodes, this branch waited on the disconnected agent.

      I manually stepped in and restarted the instance and it registered again on the Jenkins master.  Only after the slave node connected did the build fail.  I was expecting one of the three outcomes, instead I had to manually step in to free the hung build.

      1.  The branch would have detected the disconnected slave node and ran on another available one.

      2.  The branch would have failed immediately when the slave node disconnected similar to freestyle.

      3.  The branch and build would have resumed successfully once the slave reconnected.

      I was able to reproduce this issue using the Pipeline code below and disconnecting the slave during the "sleep 15s" step.

      timestamps {
      node("JENKINS-SLAVE-LABEL") {
         
            sh 'echo "First task"'
            sh 'sleep 15s'
            sh 'echo "Last task"'
          }
      }

       

      Below are the build logs after disconnecting the slave during "sleep 15s" and reconnecting the slave again after about a minute.

      [Pipeline] timestamps
      [Pipeline] {
      [Pipeline] node
      23:27:05 Running on JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx) in /home/centos/workspace/JOBNAME
      [Pipeline] {
      [Pipeline] sh
      23:27:13 [JOBNAME] Running shell script
      23:27:14 + echo 'First task'
      23:27:14 First task
      [Pipeline] sh
      23:27:14 [JOBNAME] Running shell script
      23:27:15 + sleep 15s
      23:27:25 Cannot contact JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx): java.io.IOException: remote file operation failed: /home/centos/workspace/JOBNAME at hudson.remoting.Channel@32fe452c:JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx): hudson.remoting.ChannelClosedException: channel is already closed
      [Pipeline] sh
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] End of Pipeline
      Command close created at
          at hudson.remoting.Command.<init>(Command.java:60)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1123)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1121)
          at hudson.remoting.Channel.close(Channel.java:1281)
          at hudson.remoting.Channel.close(Channel.java:1263)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
      Caused: hudson.remoting.Channel$OrderlyShutdown
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
          at hudson.remoting.Channel$1.handle(Channel.java:527)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
      Caused: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:605)
          at hudson.remoting.Request.call(Request.java:130)
          at hudson.remoting.Channel.call(Channel.java:829)
          at hudson.FilePath.act(FilePath.java:987)
          at hudson.FilePath.act(FilePath.java:976)
          at hudson.FilePath.mkdirs(FilePath.java:1159)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:113)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:167)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:161)
          at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:90)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:177)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
          at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
          at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
          at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
          at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:133)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:153)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:157)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
      Caused: java.io.IOException: remote file operation failed: /home/centos/workspace/JOBNAME at hudson.remoting.Channel@32fe452c:JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx)
          at hudson.FilePath.act(FilePath.java:994)
          at hudson.FilePath.act(FilePath.java:976)
          at hudson.FilePath.mkdirs(FilePath.java:1159)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:113)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:167)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:161)
          at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:90)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:177)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
          at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
          at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
          at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
          at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:133)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:153)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:157)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
          at WorkflowScript.run(WorkflowScript:6)
          at ___cps.transform___(Native Method)
          at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
          at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
          at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
          at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
          at com.cloudbees.groovy.cps.Next.step(Next.java:83)
          at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
          at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
          at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
          at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
          at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
          at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
          at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            This is a dupe of something I filed in workflow-durable-task-step.

            Show
            jglick Jesse Glick added a comment - This is a dupe of something I filed in workflow-durable-task-step .
            Hide
            jglick Jesse Glick added a comment -

            This is a dupe of something I filed in workflow-durable-task-step.

            Show
            jglick Jesse Glick added a comment - This is a dupe of something I filed in workflow-durable-task-step .
            Hide
            abayer Andrew Bayer added a comment -

            Jesse Glick Do you know what issue?

            Show
            abayer Andrew Bayer added a comment - Jesse Glick Do you know what issue?
            Hide
            mkozell Mike Kozell added a comment -

            This issue still occurs on:

            Jenkins 2.89.4
            Pipeline 2.5
            Pipeline API 2.26
            Pipeline Nodes and Processes 2.19
            Pipeline Step API 2.14
            Scripts Security 1.41
            durabilityHint=PERFORMANCE_OPTIMIZED
            org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty
            Groovy Sandbox = disabled

            Show
            mkozell Mike Kozell added a comment - This issue still occurs on: Jenkins 2.89.4 Pipeline 2.5 Pipeline API 2.26 Pipeline Nodes and Processes 2.19 Pipeline Step API 2.14 Scripts Security 1.41 durabilityHint=PERFORMANCE_OPTIMIZED org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty Groovy Sandbox = disabled

              People

              • Assignee:
                Unassigned
                Reporter:
                mkozell Mike Kozell
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: