Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25727

Occasional exit status -1 with long latencies

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: durable-task-plugin
    • Labels:
    • Environment:
      Jenkins ver. 1.580.1.1-beta-6 (Jenkins Enterprise by CloudBees 14.11)
    • Similar Issues:

      Description

      org.jenkinsci.plugins.workflow.cps.steps.ParallelStepException: Parallel step long running test task failed
      	at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.checkAllDone(ParallelStep.java:126)
      	at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.onFailure(ParallelStep.java:105)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:295)
      	at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(ThrowBlock.java:68)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:58)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:145)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:262)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:70)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:172)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: hudson.AbortException: script returned exit code -1
      	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:205)
      	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:159)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      	... 3 more
      Finished: FAILURE
      

        Attachments

          Issue Links

            Activity

            varmenise valentina armenise created issue -
            kohsuke Kohsuke Kawaguchi made changes -
            Field Original Value New Value
            Description org.jenkinsci.plugins.workflow.cps.steps.ParallelStepException: Parallel step long running test task failed
            at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.checkAllDone(ParallelStep.java:126)
            at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.onFailure(ParallelStep.java:105)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:295)
            at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(ThrowBlock.java:68)
            at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
            at com.cloudbees.groovy.cps.Next.step(Next.java:58)
            at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:145)
            at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:262)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:70)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:174)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:172)
            at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
            Caused by: hudson.AbortException: script returned exit code -1
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:205)
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:159)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
            ... 3 more
            Finished: FAILURE
            {noformat}
            org.jenkinsci.plugins.workflow.cps.steps.ParallelStepException: Parallel step long running test task failed
            at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.checkAllDone(ParallelStep.java:126)
            at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.onFailure(ParallelStep.java:105)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:295)
            at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(ThrowBlock.java:68)
            at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
            at com.cloudbees.groovy.cps.Next.step(Next.java:58)
            at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:145)
            at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:262)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:70)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:174)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:172)
            at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
            Caused by: hudson.AbortException: script returned exit code -1
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:205)
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:159)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
            ... 3 more
            Finished: FAILURE
            {noformat}
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            The fake exit code "-1" signifies that the process has disappeared without leaving the exit code file behind.

            Looking into why this is the case.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - The fake exit code "-1" signifies that the process has disappeared without leaving the exit code file behind. Looking into why this is the case.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            valentina armenise confirmed that the master and a slave has a large latency between them, and that the issue happens about once in 10. So the race condition hypothesis feels more feasible.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - valentina armenise confirmed that the master and a slave has a large latency between them, and that the issue happens about once in 10. So the race condition hypothesis feels more feasible.
            jglick Jesse Glick made changes -
            Labels parallel workflow workflow
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            jglick Jesse Glick made changes -
            Summary parallel steps of workflow executions randomly fail Occasional exit status -1 with long latencies
            Component/s durable-task-plugin [ 18622 ]
            Component/s workflow-plugin [ 18820 ]
            Hide
            jglick Jesse Glick added a comment -

            88aed02 was apparently not enough. We have a PID for the wrapper script, and jenkins-result.txt has not been created, yet the wrapper script does not seem to be running any more. Unclear what leads to this situation.

            Show
            jglick Jesse Glick added a comment - 88aed02 was apparently not enough. We have a PID for the wrapper script, and jenkins-result.txt has not been created, yet the wrapper script does not seem to be running any more. Unclear what leads to this situation.
            jglick Jesse Glick made changes -
            Link This issue is blocking JENKINS-22249 [ JENKINS-22249 ]
            Hide
            jglick Jesse Glick added a comment -

            Kohsuke Kawaguchi suggests that ShellController.exitStatus hits a race condition: it calls exitStatus while the process is running, which returns null, then the process finishes, then isAlive is called and says it is not running.

            The probable fix is to recheck exitStatus before returning -1.

            Show
            jglick Jesse Glick added a comment - Kohsuke Kawaguchi suggests that ShellController.exitStatus hits a race condition: it calls exitStatus while the process is running, which returns null, then the process finishes, then isAlive is called and says it is not running. The probable fix is to recheck exitStatus before returning -1.
            jglick Jesse Glick made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            Hide
            jglick Jesse Glick added a comment -

            In fact he already attempted that fix in https://github.com/jenkinsci/durable-task-plugin/commit/10a3ebdc1e4825fd334cfe58ecf294c9384d5f06 though this is not complete.

            Show
            jglick Jesse Glick added a comment - In fact he already attempted that fix in https://github.com/jenkinsci/durable-task-plugin/commit/10a3ebdc1e4825fd334cfe58ecf294c9384d5f06 though this is not complete.
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            jglick Jesse Glick added a comment -

            Released attempted fix in Durable Task plugin 1.0.

            Show
            jglick Jesse Glick added a comment - Released attempted fix in Durable Task plugin 1.0.
            Hide
            varmenise valentina armenise added a comment -

            tested with the plugin version 1.0. It worked

            Show
            varmenise valentina armenise added a comment - tested with the plugin version 1.0. It worked
            Hide
            sumdumgai A C added a comment -

            Reopening. A similar hang is occasionally occurring again with Jenkins 1.6.13 - 1.6.15 and Workflow 1.6 during bat steps in Windows. The process that workflow is waiting for has successfully ended according to log output, there are no rogue generated batch files and directories left either, so it still seems like a race condition exists.

            Why is this concatenating to a jenkins-results.txt file anyway, that doesn't seem very robust? Can't we just re-pipe standard out and standard error directly?

            Show
            sumdumgai A C added a comment - Reopening. A similar hang is occasionally occurring again with Jenkins 1.6.13 - 1.6.15 and Workflow 1.6 during bat steps in Windows. The process that workflow is waiting for has successfully ended according to log output, there are no rogue generated batch files and directories left either, so it still seems like a race condition exists. Why is this concatenating to a jenkins-results.txt file anyway, that doesn't seem very robust? Can't we just re-pipe standard out and standard error directly?
            sumdumgai A C made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            jglick Jesse Glick added a comment -

            A C I am not sure what bug you are seeing but it sounds different than this one. Better to file separately. Note that 1.7 included a fix in a related area.

            Show
            jglick Jesse Glick added a comment - A C I am not sure what bug you are seeing but it sounds different than this one. Better to file separately. Note that 1.7 included a fix in a related area.
            jglick Jesse Glick made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            sumdumgai A C added a comment -

            OK. WF 1.7 seems to have made this problem worse, tracking a possibly related symptom in JENKINS-28604.

            Show
            sumdumgai A C added a comment - OK. WF 1.7 seems to have made this problem worse, tracking a possibly related symptom in JENKINS-28604 .
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 159704 ] JNJira + In-Review [ 196175 ]
            abayer Andrew Bayer made changes -
            Labels workflow pipeline workflow
            abayer Andrew Bayer made changes -
            Labels pipeline workflow pipeline

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                varmenise valentina armenise
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: