Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33761

Ability to disable Pipeline durability and "resume" build.

    XMLWordPrintable

    Details

    • Sprint:
      Blue Ocean 1.4 - beta 2, Pipeline - December
    • Similar Issues:

      Description

      Having some state being generated at the each node during execution, resuming builds after jenkins restarts or nodes reboots are just not feasible sometimes and can result in infinite hangs in some cases.  Also, providing durability results in extensive writes to disk that can bring performance crashing down. 

      It would be great to be able to specify that jobs don't resume upon interruptions, but rather just fail. This would increase the robustness of the system ideally, since upon nodes restarting, they quickly pick up jobs that tries to resume and hangs exhausting all available executors quickly.

      Implementation notes:

      • Requires a new OptionalJobProperty on the job, optionally a new BranchProperty in workflow-multibranch-plugin that echoes that same property
      • Needs some way to signal to storage (workflow-support) and execution (workflow-cps) that the pipeline is running with resume OFF to hint that they can use faster nondurable execution.

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment -

            Released with... uh, well take a look at the Jenkins Pipeline Handbook entry on scaling pipeline for versions.

            Show
            svanoort Sam Van Oort added a comment - Released with... uh, well take a look at the Jenkins Pipeline Handbook entry on scaling pipeline for versions.
            Hide
            gregcovertsmith Greg Smith added a comment -

            For those watching, found direct link Sam mentioned:

            https://jenkins.io/doc/book/pipeline/scaling-pipeline/

            Show
            gregcovertsmith Greg Smith added a comment - For those watching, found direct link Sam mentioned: https://jenkins.io/doc/book/pipeline/scaling-pipeline/
            Hide
            mkozell Mike Kozell added a comment - - edited

            Sam Van Oort

            After upgrading Jenkins with the following, I was not able to reproduce the issue after a build timeout, cancelling a build, and restarting Jenkins in the middle of a build.

            Jenkins 2.89.4
            Pipeline 2.5
            Pipeline API 2.26
            Pipeline Nodes and Processes 2.19
            Pipeline Step API 2.14
            Scripts Security 1.41
            durabilityHint=PERFORMANCE_OPTIMIZED
            org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty
            Groovy Sandbox = disabled
            Java = 1.8.0_162

            Although my jobs correctly didn't resume after Jenkins restart, I did see the message below in the build logs.

            Resuming build at Sat Feb 24 06:38:10 UTC 2018 after Jenkins restart
             [Pipeline] End of Pipeline
             java.io.IOException: Cannot resume build – was not cleanly saved when Jenkins shut down.
            Show
            mkozell Mike Kozell added a comment - - edited Sam Van Oort After upgrading Jenkins with the following, I was not able to reproduce the issue after a build timeout, cancelling a build, and restarting Jenkins in the middle of a build. Jenkins 2.89.4 Pipeline 2.5 Pipeline API 2.26 Pipeline Nodes and Processes 2.19 Pipeline Step API 2.14 Scripts Security 1.41 durabilityHint=PERFORMANCE_OPTIMIZED org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty Groovy Sandbox = disabled Java = 1.8.0_162 Although my jobs correctly didn't resume after Jenkins restart, I did see the message below in the build logs. Resuming build at Sat Feb 24 06:38:10 UTC 2018 after Jenkins restart [Pipeline] End of Pipeline java.io.IOException: Cannot resume build – was not cleanly saved when Jenkins shut down.
            Hide
            hellspam Roy Arnon added a comment -

            Hello,

            I am not sure this is related to this issue, but in our pipeline build job we recently added the disableResume step and it does not seem to work correctly:

            Jenkins 2.89.3
            Pipeline 2.5
            Pipeline API 2.27
            Pipeline Nodes and Processes 2.20
            Pipeline Step API 2.16
            Scripts Security 1.44
            durabilityHint=PERFORMANCE_OPTIMIZED
            org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty
            Groovy Sandbox = disabled

             

            Creating placeholder flownodes because failed loading originals.
            Resuming build at Thu Aug 30 12:42:45 UTC 2018 after Jenkins restart
            [Bitbucket] Notifying pull request build result
            [Bitbucket] Build result notified
            [lockable-resources] released lock on [UNIT_TEST_RESOURCE_3]
            java.io.IOException: Tried to load head FlowNodes for execution Owner[Products.Pipeline/PR-5615/7:Products.Pipeline/PR-5615 #7] but FlowNode was not found in storage for head id:FlowNodeId 1:586
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:678)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:715)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:875)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:745)
            	at hudson.model.RunMap.retrieve(RunMap.java:225)
            	at hudson.model.RunMap.retrieve(RunMap.java:57)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380)
            	at hudson.model.RunMap.getById(RunMap.java:205)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:1098)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:1109)
            	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:65)
            	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:57)
            	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
            	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
            	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:178)
            	at jenkins.model.Jenkins.<init>(Jenkins.java:974)
            	at hudson.model.Hudson.<init>(Hudson.java:86)
            	at hudson.model.Hudson.<init>(Hudson.java:82)
            	at hudson.WebAppMain$3.run(WebAppMain.java:233)
            Finished: SUCCESS

            This is an issue for us as the build was marked as SUCCESS in bitbucket, which allowed a user to merge a failing test into our release branch.

            The job was definitely running with resume disabled, as this was printed at start of job:

            Resume disabled by user, switching to high-performance, low-durability mode.

            Any ideas? 

            Show
            hellspam Roy Arnon added a comment - Hello, I am not sure this is related to this issue, but in our pipeline build job we recently added the disableResume step and it does not seem to work correctly: Jenkins 2.89.3 Pipeline 2.5 Pipeline API 2.27 Pipeline Nodes and Processes 2.20 Pipeline Step API 2.16 Scripts Security 1.44 durabilityHint=PERFORMANCE_OPTIMIZED org.jenkinsci.plugins.workflow.job.properties.DisableResumeJobProperty Groovy Sandbox = disabled   Creating placeholder flownodes because failed loading originals. Resuming build at Thu Aug 30 12:42:45 UTC 2018 after Jenkins restart [Bitbucket] Notifying pull request build result [Bitbucket] Build result notified [lockable-resources] released lock on [UNIT_TEST_RESOURCE_3] java.io.IOException: Tried to load head FlowNodes for execution Owner[Products.Pipeline/PR-5615/7:Products.Pipeline/PR-5615 #7] but FlowNode was not found in storage for head id:FlowNodeId 1:586 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:678) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:715) at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:875) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:745) at hudson.model.RunMap.retrieve(RunMap.java:225) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380) at hudson.model.RunMap.getById(RunMap.java:205) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:1098) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:1109) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:65) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:57) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:178) at jenkins.model.Jenkins.<init>(Jenkins.java:974) at hudson.model.Hudson.<init>(Hudson.java:86) at hudson.model.Hudson.<init>(Hudson.java:82) at hudson.WebAppMain$3.run(WebAppMain.java:233) Finished: SUCCESS This is an issue for us as the build was marked as SUCCESS in bitbucket, which allowed a user to merge a failing test into our release branch. The job was definitely running with resume disabled, as this was printed at start of job: Resume disabled by user, switching to high-performance, low-durability mode. Any ideas? 
            Hide
            rg Russell Gallop added a comment -

            We have seen the same thing. Resume definitely disabled and still causing hangs.

            Show
            rg Russell Gallop added a comment - We have seen the same thing. Resume definitely disabled and still causing hangs.

              People

              • Assignee:
                svanoort Sam Van Oort
                Reporter:
                jtilander Jim Tilander
              • Votes:
                47 Vote for this issue
                Watchers:
                50 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: