Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25504

Failing a Step with a body while the body is running breaks FlowNodeGraph

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: pipeline
    • Labels:
      None
    • Similar Issues:

      Description

      Reported by Jesse Glick

      I found a StepEndNode of an ExecutorStep with an ErrorAction encoding

      java.io.NotSerializableException: hudson.model.Executor
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:890)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:584)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:679)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
      	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:128)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:320)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:304)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:278)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:68)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:168)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:166)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      However its descendant StepEndNode did not have an ErrorAction, nor did that node’s descendant StepEndNode, nor the final FlowEndNode; so FlowExecution.getCauseOfFailure was null and there was no stack trace in the log.

      I assumed from the stack trace that the exception would have been caught in saveProgram, and that propagateErrorToWorkflow was therefore called, but the log contained no message about program state save failed.

      Pretty well reproducible: just run a flow allocating a docker-plugin slave, let it run a slow shell step, and restart in the middle. The exact set of errors seems to differ from run to run, but they are never printed to the log.

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
            http://jenkins-ci.org/commit/workflow-plugin/1e843f10aa189d7f14f218ea642286438fce08d5
            Log:
            JENKINS-25504 Async step should continue executing until all the bodies are done and the outcome is set.

            Previously, as soon as an outcome is set the step was considered done, even when the body was running.
            This corrupts the flow graph as multiple CpsThreads collide on trying to update the same FlowHead.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java http://jenkins-ci.org/commit/workflow-plugin/1e843f10aa189d7f14f218ea642286438fce08d5 Log: JENKINS-25504 Async step should continue executing until all the bodies are done and the outcome is set. Previously, as soon as an outcome is set the step was considered done, even when the body was running. This corrupts the flow graph as multiple CpsThreads collide on trying to update the same FlowHead.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
            http://jenkins-ci.org/commit/workflow-plugin/17206dcbadc892231e43b8e11962b80cff28ff15
            Log:
            JENKINS-25504

            If the step is marked as failed while the body is still running, try to interrupt the body execution.
            This would speed up the step as a whole completing as a failure.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java http://jenkins-ci.org/commit/workflow-plugin/17206dcbadc892231e43b8e11962b80cff28ff15 Log: JENKINS-25504 If the step is marked as failed while the body is still running, try to interrupt the body execution. This would speed up the step as a whole completing as a failure.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
            http://jenkins-ci.org/commit/workflow-plugin/68de87c231d01605f61344cc3eccbafcbaf71470
            Log:
            JENKINS-25504

            The previous attempt failed because CpsStepContext can get duplicated because of persistence.

            This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time.

            That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-plugin/68de87c231d01605f61344cc3eccbafcbaf71470 Log: JENKINS-25504 The previous attempt failed because CpsStepContext can get duplicated because of persistence. This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time. That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Problem sufficiently patched.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Problem sufficiently patched.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
            cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
            http://jenkins-ci.org/commit/workflow-cps-plugin/e22ae99a0994358fdb10ca66ca77c1f47ed73460
            Log:
            JENKINS-25504

            The previous attempt failed because CpsStepContext can get duplicated because of persistence.

            This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time.

            That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.

            Originally-Committed-As: 68de87c231d01605f61344cc3eccbafcbaf71470

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-cps-plugin/e22ae99a0994358fdb10ca66ca77c1f47ed73460 Log: JENKINS-25504 The previous attempt failed because CpsStepContext can get duplicated because of persistence. This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time. That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls. Originally-Committed-As: 68de87c231d01605f61344cc3eccbafcbaf71470

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                kohsuke Kohsuke Kawaguchi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: