Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-34256

Preparing Jenkins For Shutdown Hangs Running Pipelines

    Details

    • Similar Issues:
    • Released As:
      workflow-cps 2.78

      Description

      Start a couple long-running pipelines with

      node

      { sleep 100 }

      Queue up a few more jobs. Go to "manage jenkins" and "prepare for shutdown."

      Now pipeline jobs that would finish and unenqueue never finish and have to manually be killed (which does work). Freestyle jobs complete normally. Queued jobs aren't run, so that part of prepare-for-shutdown works.

      Even stranger: upon killing and restarting with Ctrl+C, we get this lovely conundrum:

      Those pipeline builds won't show up in the build queue on the main screen.

      Checks to do:

      • Regression in core?
      • Regression in pipeline?
      • does /safeRestart or /restart trigger it?

        Attachments

          Issue Links

            Activity

            svanoort Sam Van Oort created issue -
            Hide
            swashbuck1r Spike Washburn added a comment -

            Sam will investigate if this is a 2.0 regression, which would increase priority.

            Show
            swashbuck1r Spike Washburn added a comment - Sam will investigate if this is a 2.0 regression, which would increase priority.
            svanoort Sam Van Oort made changes -
            Field Original Value New Value
            Description Start a couple long-running pipelines with

            node {
               sleep 100
            }

            Queue up a few more jobs. Go to "manage jenkins" and "prepare for shutdown."

            Now pipeline jobs that would finish and unenqueue *never* finish and have to manually be killed (which does work). Freestyle jobs complete normally. Queued jobs aren't run, so that part of prepare-for-shutdown works.

            Even stranger: upon killing and restarting with Ctrl+C, we get this lovely conundrum:
            !Screen Shot 2016-04-14 at 3.45.11 PM.png|thumbnail!

            Those pipeline builds won't show up in the build queue on the main screen.

            Start a couple long-running pipelines with

            node {
               sleep 100
            }

            Queue up a few more jobs. Go to "manage jenkins" and "prepare for shutdown."

            Now pipeline jobs that would finish and unenqueue *never* finish and have to manually be killed (which does work). Freestyle jobs complete normally. Queued jobs aren't run, so that part of prepare-for-shutdown works.

            Even stranger: upon killing and restarting with Ctrl+C, we get this lovely conundrum:
            !Screen Shot 2016-04-14 at 3.45.11 PM.png|thumbnail!

            Those pipeline builds won't show up in the build queue on the main screen.

            Checks to do:
            - Regression in core?
            - Regression in pipeline?
            - does /safeRestart or /restart trigger it?
            Hide
            svanoort Sam Van Oort added a comment -

            Alright, I'm stumped now. I can't reproduce the hung pipelines no matter what I try now, even though it was consistent before. The issues with the weird flashing pipeline results afterward is probably due to the standing issue with queues, so... I'm going to close this as "cannot reproduce" for now, unless it recurs.

            Show
            svanoort Sam Van Oort added a comment - Alright, I'm stumped now. I can't reproduce the hung pipelines no matter what I try now, even though it was consistent before. The issues with the weird flashing pipeline results afterward is probably due to the standing issue with queues, so... I'm going to close this as "cannot reproduce" for now, unless it recurs.
            Hide
            svanoort Sam Van Oort added a comment -

            Can't reproduce the hung jobs for whatever reason now, even though it was consistent before. The flashing and bizarre UI results with pipeline probably restate to the standing queue bugs, so I'm closing this unless it recurs.

            Show
            svanoort Sam Van Oort added a comment - Can't reproduce the hung jobs for whatever reason now, even though it was consistent before. The flashing and bizarre UI results with pipeline probably restate to the standing queue bugs, so I'm closing this unless it recurs.
            svanoort Sam Van Oort made changes -
            Status Open [ 1 ] Closed [ 6 ]
            Assignee Jesse Glick [ jglick ]
            Resolution Cannot Reproduce [ 5 ]
            Hide
            ruoso Daniel Ruoso added a comment - - edited

            I can reproduce this issue... If I have several jobs like that scheduled, they show up as "part of ..." in the queue. The jobs get stuck waiting to be scheduled, but since jenkins is marked for shutdown they never get scheduled.

            The status of the node, however, never goes back to "idle", so it's never clear when is it safe to shutdown the master.

            Show
            ruoso Daniel Ruoso added a comment - - edited I can reproduce this issue... If I have several jobs like that scheduled, they show up as "part of ..." in the queue. The jobs get stuck waiting to be scheduled, but since jenkins is marked for shutdown they never get scheduled. The status of the node, however, never goes back to "idle", so it's never clear when is it safe to shutdown the master.
            ruoso Daniel Ruoso made changes -
            Resolution Cannot Reproduce [ 5 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            Hide
            ruoso Daniel Ruoso added a comment -

            I suspect the difference is related to the fact that the jobs show up in the queue as "part of ..." as opposed of just the job name...

            When I look at the steps of the pipeline, it shows up as:

            "
            Start of Pipeline Success
            Allocate node : Start
            "

            and the "allocate node: start" has the status as 'running', which seems to make the node behave as if it was not idle.

            Show
            ruoso Daniel Ruoso added a comment - I suspect the difference is related to the fact that the jobs show up in the queue as "part of ..." as opposed of just the job name... When I look at the steps of the pipeline, it shows up as: " Start of Pipeline Success Allocate node : Start " and the "allocate node: start" has the status as 'running', which seems to make the node behave as if it was not idle.
            ruoso Daniel Ruoso made changes -
            Summary Preparing Jenkins 2 For Shutdown Hangs Running Pipelines Preparing Jenkins For Shutdown Hangs Running Pipelines
            ruoso Daniel Ruoso made changes -
            Labels 2.0 2.0-rc testfest 2.0 2.0-rc lts testfest
            ruoso Daniel Ruoso made changes -
            Environment Jenkins 2.0-rc-1
            Running from WAR on Mac
            Jenkins 2.0-rc-1
            Running from WAR on Mac
            Jenkins 1.651.1 running from WAR on Linux
            Hide
            ruoso Daniel Ruoso added a comment -

            I just realized this was referred as a jenkins 2 problem, but I definitely can see exactly the same problem on the current LTS version.

            Show
            ruoso Daniel Ruoso added a comment - I just realized this was referred as a jenkins 2 problem, but I definitely can see exactly the same problem on the current LTS version.
            Hide
            ruoso Daniel Ruoso added a comment -

            One interesting aspect of this could be the fact that I have set up the master with only one executor, so the step of the pipeline that is scheduling the job that goes into the node is probably getting in a weird state.

            Show
            ruoso Daniel Ruoso added a comment - One interesting aspect of this could be the fact that I have set up the master with only one executor, so the step of the pipeline that is scheduling the job that goes into the node is probably getting in a weird state.
            Hide
            ruoso Daniel Ruoso added a comment -

            Also confirmed this happens with latest 1.656.1

            Show
            ruoso Daniel Ruoso added a comment - Also confirmed this happens with latest 1.656.1
            ruoso Daniel Ruoso made changes -
            Labels 2.0 2.0-rc lts testfest 1.651.1 2.0 2.0-rc lts testfest
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso This narrows down the problem a bit. Do you have testcase (preferably a simple one) that will reproduce this situation?

            From what I'm seeing, I'm somewhat confused by final state, because either the allocate node step should block (in which case you never see 'part of' in the build queue, the node will not be in use, and execution resumes after restart), or it should complete (in which case the pipeline finishes execution on that node and releases it).

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso This narrows down the problem a bit. Do you have testcase (preferably a simple one) that will reproduce this situation? From what I'm seeing, I'm somewhat confused by final state, because either the allocate node step should block (in which case you never see 'part of' in the build queue, the node will not be in use, and execution resumes after restart), or it should complete (in which case the pipeline finishes execution on that node and releases it).
            Hide
            ruoso Daniel Ruoso added a comment -

            I'm not sure how to author the test case, but all I did was a "node('master')

            { sleep 90 }

            " in a pipeline job, had the number of executors set to 1, then click several times in the "build now". This will show up a bunch of "part of ..." jobs in the queue. Then I mark jenkins as shutindown, and the jobs will be stuck. If I kill jenkins and restart, the jobs will show as running in the job status list, but will not be in the executor queue at all.

            Show
            ruoso Daniel Ruoso added a comment - I'm not sure how to author the test case, but all I did was a "node('master') { sleep 90 } " in a pipeline job, had the number of executors set to 1, then click several times in the "build now". This will show up a bunch of "part of ..." jobs in the queue. Then I mark jenkins as shutindown, and the jobs will be stuck. If I kill jenkins and restart, the jobs will show as running in the job status list, but will not be in the executor queue at all.
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso Is there any particular timing dependency? Fresh install?

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso Is there any particular timing dependency? Fresh install?
            Hide
            ruoso Daniel Ruoso added a comment -

            So what need to happens is

            1) nothing on the queue, one executor available.
            2) I click on "build now" for my "sleep job" several times
            3) the executor will be running one of them, and you will see several "part of sleep job #xxx" in the queue
            4) when I mark jenkins for shutdown, the current job will finish, the remaining jobs will stay on the queue
            5) if I click on "build now" for the job again, I see "sleep job #xx" in the queue instead of "part of sleep job #xx".
            6) I wait until the job is finished, then I send a SIGINT to the jenkins master, which comes down.
            7) when I bring jenkins up again, the jobs will not be on the executor queue, but if I look in the job history, they will show up as if they were running, when I look at the details, they will be waiting to be scheduled, but they will be stuck forever.

            This is a fairly fresh install, with the "Pipeline" plugin installed and all plugins up-to-date.

            Show
            ruoso Daniel Ruoso added a comment - So what need to happens is 1) nothing on the queue, one executor available. 2) I click on "build now" for my "sleep job" several times 3) the executor will be running one of them, and you will see several "part of sleep job #xxx" in the queue 4) when I mark jenkins for shutdown, the current job will finish, the remaining jobs will stay on the queue 5) if I click on "build now" for the job again, I see "sleep job #xx" in the queue instead of "part of sleep job #xx". 6) I wait until the job is finished, then I send a SIGINT to the jenkins master, which comes down. 7) when I bring jenkins up again, the jobs will not be on the executor queue, but if I look in the job history, they will show up as if they were running, when I look at the details, they will be waiting to be scheduled, but they will be stuck forever. This is a fairly fresh install, with the "Pipeline" plugin installed and all plugins up-to-date.
            Hide
            ruoso Daniel Ruoso added a comment -

            I suspect this is related to how the pipeline execution seem to use the master node, but not one of the executor slots. i.e.: when I click several times, I see the master node being assigned the outside of the pipeline job, which will then try to allocate a node. I see several of those in parallel, even when the master node has only one executor.

            Show
            ruoso Daniel Ruoso added a comment - I suspect this is related to how the pipeline execution seem to use the master node, but not one of the executor slots. i.e.: when I click several times, I see the master node being assigned the outside of the pipeline job, which will then try to allocate a node. I see several of those in parallel, even when the master node has only one executor.
            svanoort Sam Van Oort made changes -
            Assignee Sam Van Oort [ svanoort ]
            Hide
            svanoort Sam Van Oort added a comment -

            To provide an update: I have recently restarted the deeper investigation of this issue.

            Show
            svanoort Sam Van Oort added a comment - To provide an update: I have recently restarted the deeper investigation of this issue.
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso Okay, I must admit to being stumped: no matter what I try, I can't reproduce this. Testing with Jenkins 2.8 and the latest pipeline plugin – whether I start prepare for shutdown before the first node block, during its execution and whether or not I have a second node block on the job. This also applies whether or not I schedule an additional execution during prepare for shutdown mode.

            Can you provide an exact job and timing that will trigger this issue consistently? I am wondering if it is related to the queueing issues resolved in
            https://issues.jenkins-ci.org/browse/JENKINS-34281 – which are included in Jenkins 2.1.

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso Okay, I must admit to being stumped: no matter what I try, I can't reproduce this. Testing with Jenkins 2.8 and the latest pipeline plugin – whether I start prepare for shutdown before the first node block, during its execution and whether or not I have a second node block on the job. This also applies whether or not I schedule an additional execution during prepare for shutdown mode. Can you provide an exact job and timing that will trigger this issue consistently? I am wondering if it is related to the queueing issues resolved in https://issues.jenkins-ci.org/browse/JENKINS-34281 – which are included in Jenkins 2.1.
            Hide
            ruoso Daniel Ruoso added a comment -

            I'm using Jenkins 1.651.1 running from WAR on Linux, not Jenkins 2.8

            Show
            ruoso Daniel Ruoso added a comment - I'm using Jenkins 1.651.1 running from WAR on Linux, not Jenkins 2.8
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso Please can you copy the JENKINS_HOME and try with the the 2.8 WAR? I suspect this is linked to JENKINS-34281, which is not fixed on the Jenkins 1.651.1 release line.

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso Please can you copy the JENKINS_HOME and try with the the 2.8 WAR? I suspect this is linked to JENKINS-34281 , which is not fixed on the Jenkins 1.651.1 release line.
            Hide
            ruoso Daniel Ruoso added a comment -

            How compatible is 2.8 compared to 1.651 in terms of the API?

            Show
            ruoso Daniel Ruoso added a comment - How compatible is 2.8 compared to 1.651 in terms of the API?
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso It should be compatible except for dropping AJP support. You don't need to do a full upgrade anyway, just start with a fresh instance and provide a testcase that reproduces this under Jenkins 2.1+ (I suggest 2.8 as the latest). If it can't be reproduced, the issue is probably resolved by JENKINS-34281 fix - for OSS Jenkins, that means an upgrade, otherwise it would need a backport to 1.651 line.

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso It should be compatible except for dropping AJP support. You don't need to do a full upgrade anyway, just start with a fresh instance and provide a testcase that reproduces this under Jenkins 2.1+ (I suggest 2.8 as the latest). If it can't be reproduced, the issue is probably resolved by JENKINS-34281 fix - for OSS Jenkins, that means an upgrade, otherwise it would need a backport to 1.651 line.
            Hide
            ruoso Daniel Ruoso added a comment -

            ok, since Jenkins2 is now actually released, I'll do the upgrade and see if I still have the problem.

            Show
            ruoso Daniel Ruoso added a comment - ok, since Jenkins2 is now actually released, I'll do the upgrade and see if I still have the problem.
            Hide
            ruoso Daniel Ruoso added a comment -

            I just tested with latest Jenkins release and I can't reproduce the bug.

            When I initially reported this Jenkins2 was not really released, it is now... I'll just move to the new version.

            Show
            ruoso Daniel Ruoso added a comment - I just tested with latest Jenkins release and I can't reproduce the bug. When I initially reported this Jenkins2 was not really released, it is now... I'll just move to the new version.
            Hide
            svanoort Sam Van Oort added a comment -

            Daniel Ruoso Excellent! I'm going to go ahead and close this one out as a duplicate of the other one then.

            Show
            svanoort Sam Van Oort added a comment - Daniel Ruoso Excellent! I'm going to go ahead and close this one out as a duplicate of the other one then.
            Hide
            svanoort Sam Van Oort added a comment -

            Appears to be resolved by the fix to the queue issue.

            Show
            svanoort Sam Van Oort added a comment - Appears to be resolved by the fix to the queue issue.
            svanoort Sam Van Oort made changes -
            Link This issue duplicates JENKINS-34281 [ JENKINS-34281 ]
            svanoort Sam Van Oort made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 170319 ] JNJira + In-Review [ 198838 ]
            abayer Andrew Bayer made changes -
            Component/s pipeline-general [ 21692 ]
            abayer Andrew Bayer made changes -
            Component/s workflow-plugin [ 18820 ]
            Hide
            lswithenbank Luke Swithenbank added a comment -

            I seem to be having this same issue again with Jenkins 2.18, and the workflow-job plugin 2.5.

            Show
            lswithenbank Luke Swithenbank added a comment - I seem to be having this same issue again with Jenkins 2.18, and the workflow-job plugin 2.5.
            Hide
            bjanda Bartosz Janda added a comment -

            I've the same problem with Jenkins 2.57

            Show
            bjanda Bartosz Janda added a comment - I've the same problem with Jenkins 2.57
            Hide
            jyoukhana John Youkhana added a comment -

            I have the same problem with Jenkins 2.66

            Show
            jyoukhana John Youkhana added a comment - I have the same problem with Jenkins 2.66
            Hide
            estamand Eric St-Amand added a comment -

            Problem still present on Jenkins 2.73.1.  Using Prepare For Shutdown breaks all currently running pipelines, kind of falling into a deadlock.  Need to manually kill job though the CLI and restart Jenkins.  All plugins at latest version.

            Show
            estamand Eric St-Amand added a comment - Problem still present on Jenkins 2.73.1.  Using Prepare For Shutdown breaks all currently running pipelines, kind of falling into a deadlock.  Need to manually kill job though the CLI and restart Jenkins.  All plugins at latest version.
            Hide
            dantran dan tran added a comment -

            I am still seeing this issue at 2.73.* and 2.81.* LTS

            Show
            dantran dan tran added a comment - I am still seeing this issue at 2.73.* and 2.81.* LTS
            dantran dan tran made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            docwhat Christian Höltje added a comment -

            I had this problem when using docker containers that ran without a PID 1.  The fix for this was to add --init as an argument to docker run.

            Show
            docwhat Christian Höltje added a comment - I had this problem when using docker containers that ran without a PID 1.  The fix for this was to add --init as an argument to docker run .
            Hide
            svanoort Sam Van Oort added a comment - - edited

            It looks like I can Edit: REPRODUCE this locally like so: 

            stage ("going to bed") {
            {{ node {}}
            {{ echo 'running a sleep'}}
            {{ sh 'for i in `seq 1 70`; do echo "sleep $i" && sleep 1; done'}}
            {{ } }}
            }

            Which means it should be debuggable/fixable now.

            Show
            svanoort Sam Van Oort added a comment - - edited It looks like I can Edit: REPRODUCE this locally like so:  stage ("going to bed") { {{ node {}} {{ echo 'running a sleep'}} {{ sh 'for i in `seq 1 70`; do echo "sleep $i" && sleep 1; done'}} {{ } }} } Which means it should be debuggable/fixable now.
            svanoort Sam Van Oort made changes -
            Status Reopened [ 4 ] Open [ 1 ]
            svanoort Sam Van Oort made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            reinholdfuereder Reinhold Füreder added a comment -
            Show
            reinholdfuereder Reinhold Füreder added a comment - Maybe JENKINS-38316 and in particular https://issues.jenkins-ci.org/browse/JENKINS-38316?focusedCommentId=332021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332021 explains the current situation (in case it has changed over the past almost 2 years)...
            Hide
            svanoort Sam Van Oort added a comment -

            Reinhold Füreder I'm 99% sure this is where the hang originates from: https://github.com/jenkinsci/workflow-cps-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L235

            This behavior seemingly was is by design, because we wanted Pipelines to halt where they are (rather than completing fully) before shutdown.
            A better design might have been a separate "paused" state.

            Unfortunately AFAIK there's not a listener in Core that we can use to notify the Pipeline to wake back up when leaving QuietingDown mode. My best notion has been for halted Pipelines to poll periodically to see if we've left quietDown mode and then resume if so – doable but rather unfortunate.

            Worse, I can't seem to actually reproduce this behavior in unit test for reasons I'm still trying to ascertain, even though it's easy to demonstrate on a normal instance: https://github.com/svanoort/workflow-cps-plugin/blob/60308b567d4bff6904d6fbc3cb57fbda564eaff7/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecutionTest.java#L249

            Jesse Glick Do you have any notions here?

            Show
            svanoort Sam Van Oort added a comment - Reinhold Füreder I'm 99% sure this is where the hang originates from: https://github.com/jenkinsci/workflow-cps-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L235 This behavior seemingly was is by design, because we wanted Pipelines to halt where they are (rather than completing fully) before shutdown. A better design might have been a separate "paused" state. Unfortunately AFAIK there's not a listener in Core that we can use to notify the Pipeline to wake back up when leaving QuietingDown mode. My best notion has been for halted Pipelines to poll periodically to see if we've left quietDown mode and then resume if so – doable but rather unfortunate. Worse, I can't seem to actually reproduce this behavior in unit test for reasons I'm still trying to ascertain, even though it's easy to demonstrate on a normal instance: https://github.com/svanoort/workflow-cps-plugin/blob/60308b567d4bff6904d6fbc3cb57fbda564eaff7/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecutionTest.java#L249 Jesse Glick Do you have any notions here?
            svanoort Sam Van Oort made changes -
            Component/s workflow-cps-plugin [ 21713 ]
            Component/s core [ 15593 ]
            Component/s pipeline [ 21692 ]
            svanoort Sam Van Oort made changes -
            Link This issue is duplicated by JENKINS-38316 [ JENKINS-38316 ]
            Hide
            svanoort Sam Van Oort added a comment -

            See JENKINS-38316 for the same issue but with additional comments/info.

            Show
            svanoort Sam Van Oort added a comment - See JENKINS-38316 for the same issue but with additional comments/info.
            svanoort Sam Van Oort made changes -
            Link This issue is related to JENKINS-38316 [ JENKINS-38316 ]
            Hide
            svanoort Sam Van Oort added a comment - - edited

            Jesse's suggestions given issues with the testcase here:

            [11:11 AM] Jesse Glick: very roughly: `semaphore 'wait'` can be the whole program; wait for it to start; `doQuietDown`; succeed step; `doCancelQuietDown`; wait for finish
            [11:11 AM] Jesse Glick: (maybe?)
            [11:11 AM] Jesse Glick: no need for `node`, `Thread.sleep`, or `waitForSuspension`
            [11:12 AM] Jesse Glick: untested, obviously, but I would try something along those lines

            Show
            svanoort Sam Van Oort added a comment - - edited Jesse's suggestions given issues with the testcase here: [11:11 AM] Jesse Glick: very roughly: `semaphore 'wait'` can be the whole program; wait for it to start; `doQuietDown`; succeed step; `doCancelQuietDown`; wait for finish [11:11 AM] Jesse Glick: (maybe?) [11:11 AM] Jesse Glick: no need for `node`, `Thread.sleep`, or `waitForSuspension` [11:12 AM] Jesse Glick: untested, obviously, but I would try something along those lines
            Hide
            svanoort Sam Van Oort added a comment -

            [11:18 AM] Jesse Glick: @Sam I suspect you are quieting down in the middle of a `sleep`, then canceling that before anything else happens in the program, so… `CpsFlowExecution` never even notices that the state flipped
            [11:19 AM] Jesse Glick: @Sam JENKINS-38316 is about the more likely scenario that the admin goes into quiet down mode, then the CPS VM thread wakes up for whatever reason, sees that it is supposed to be in quiet mode, pauses, and then never receives a notification to do anything else (unless perhaps someone manually pauses and resumes the build)

            Show
            svanoort Sam Van Oort added a comment - [11:18 AM] Jesse Glick: @Sam I suspect you are quieting down in the middle of a `sleep`, then canceling that before anything else happens in the program, so… `CpsFlowExecution` never even notices that the state flipped [11:19 AM] Jesse Glick: @Sam JENKINS-38316 is about the more likely scenario that the admin goes into quiet down mode, then the CPS VM thread wakes up for whatever reason, sees that it is supposed to be in quiet mode, pauses, and then never receives a notification to do anything else (unless perhaps someone manually pauses and resumes the build)
            Hide
            svanoort Sam Van Oort added a comment - - edited

            Jglick: so my suggested test case would first quiet down, then do something to wake up the program, then cancel quiet down

            Jesse Glick·11:21 AM you might need to do something else there, b/c I suspect there is still a race condition in that test—`SemaphoreStep.succeed` will post a task to the CPS VM thread, but you need to wait for that task to actually be processed.

            Jesse Glick·11:21 AM That might be a valid use of `waitForSuspension`.

            [11:23 AM] Jesse Glick: @sam I would suggest that https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L236 if the reason for pausing is quiet down mode, you print something to the build log. That is something the test could `waitForMessage` to see.
            [11:23 AM] Jesse Glick: (as well as being better UX)
            [11:23 AM] Jesse Glick: for the other reason we already print a message to the log: https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java#L1491
            [11:25 AM] Jesse Glick: thus the test would be: wait for `semaphore` step to start; set Jenkins to quiet mode; permit the step to finish; wait for the message saying that the build is paused due to quiet mode; cancel quiet mode; wait for build to complete on its own
            [11:26 AM] Jesse Glick: @Sam ^^^

            Show
            svanoort Sam Van Oort added a comment - - edited Jglick: so my suggested test case would first quiet down, then do something to wake up the program, then cancel quiet down Jesse Glick·11:21 AM you might need to do something else there, b/c I suspect there is still a race condition in that test—`SemaphoreStep.succeed` will post a task to the CPS VM thread, but you need to wait for that task to actually be processed. Jesse Glick·11:21 AM That might be a valid use of `waitForSuspension`. [11:23 AM] Jesse Glick: @sam I would suggest that https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L236 if the reason for pausing is quiet down mode, you print something to the build log. That is something the test could `waitForMessage` to see. [11:23 AM] Jesse Glick: (as well as being better UX) [11:23 AM] Jesse Glick: for the other reason we already print a message to the log: https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java#L1491 [11:25 AM] Jesse Glick: thus the test would be: wait for `semaphore` step to start; set Jenkins to quiet mode; permit the step to finish; wait for the message saying that the build is paused due to quiet mode; cancel quiet mode; wait for build to complete on its own [11:26 AM] Jesse Glick: @Sam ^^^
            Hide
            reinholdfuereder Reinhold Füreder added a comment - - edited

            More or less accidentally I just successfully resumed 3 pipelines after cancelling the shutdown mode after restarting Jenkins after putting Jenkins in shutdown mode (cf. also JENKINS-38316):

            • and I did NOT have to wake them up manually by the "pause"-"resume" workaround
            • maybe/presumably because I entered/started the shutdown mode in the middle of 'sh' steps, AND then waited until the end of this steps before restarting Jenkins?

            However, the following minor issues popped up – please mind that there were actually two Jenkins restarts, because Jenkins Plugins were updated (just one in fact) after the first Jenkins restart via Jenkins init.d hook scripts, followed by a second restart (after the updates):

            1. Resuming after Jenkins restart is slow
              ...
              Resuming build at Fri May 04 07:52:09 CEST 2018 after Jenkins restart
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ???
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Resuming build at Fri May 04 07:55:41 CEST 2018 after Jenkins restart
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ???
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down
              Ready to run at Fri May 04 07:57:42 CEST 2018
              [Pipeline] sh
              07:57:42 [ACME] Running shell script
              ...
              
              • ... maybe because of running into timeouts when suspending (the more or less still suspended pipelines; because first action in init.d hook scripts is setting Jenkins in shutdown mode)!?
                2018-05-04 07:52:09 INFO [hudson.WebAppMain$3 run]   Jenkins is fully up and running
                2018-05-04 07:52:10 SEVERE [jenkins.model.Jenkins$24 run]   Restarting VM as requested by SYSTEM
                2018-05-04 07:52:10 INFO [jenkins.model.Jenkins cleanUp]   Stopping Jenkins
                2018-05-04 07:52:10 INFO [jenkins.model.Jenkins$19 onAttained]   Started termination
                2018-05-04 07:52:10 WARNING [hudson.util.ExceptionCatchingThreadFactory uncaughtException]   Thread Computer.threadPoolForRemoting [#2] terminated unexpectedly
                java.nio.channels.ClosedSelectorException
                        at sun.nio.ch.SelectorImpl.keys(SelectorImpl.java:68)
                        at org.jenkinsci.remoting.protocol.IOHub.getThreadNameBase(IOHub.java:426)
                        at org.jenkinsci.remoting.protocol.IOHub.access$200(IOHub.java:69)
                        at org.jenkinsci.remoting.protocol.IOHub$IOHubSelectorWatcher.run(IOHub.java:536)
                        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
                        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                        at java.lang.Thread.run(Thread.java:748)
                
                2018-05-04 07:53:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll]   Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME2/273:Unattended-Upgrades/ACME2 #273]]
                java.util.concurrent.TimeoutException: Timeout waiting for task.
                        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259)
                        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91)
                        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555)
                        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                        at java.lang.reflect.Method.invoke(Method.java:498)
                        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
                        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
                        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
                        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276)
                        at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330)
                        at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251)
                        at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73)
                        at jenkins.model.Jenkins$24.run(Jenkins.java:4234)
                
                2018-05-04 07:54:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll]   Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME/182:Unattended-Upgrades/ACME #182]]
                java.util.concurrent.TimeoutException: Timeout waiting for task.
                        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259)
                        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91)
                        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555)
                        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                        at java.lang.reflect.Method.invoke(Method.java:498)
                        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
                        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
                        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
                        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276)
                        at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330)
                        at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251)
                        at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73)
                        at jenkins.model.Jenkins$24.run(Jenkins.java:4234)
                
                2018-05-04 07:55:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll]   Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME3/181:Unattended-Upgrades/ACME3 #181]]
                java.lang.InterruptedException
                        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
                        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
                        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:258)
                        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91)
                        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555)
                        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                        at java.lang.reflect.Method.invoke(Method.java:498)
                        at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
                        at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
                        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
                        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128)
                        at jenkins.model.Jenkins$18.execute(Jenkins.java:3333)
                        at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
                        at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276)
                        at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330)
                        at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251)
                        at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73)
                        at jenkins.model.Jenkins$24.run(Jenkins.java:4234)
                
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins$19 onAttained]   Completed termination
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpDisconnectComputers]   Starting node disconnection
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpShutdownPluginManager]   Stopping plugin manager
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpPersistQueue]   Persisting build queue
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpAwaitDisconnects]   Waiting for node disconnection completion
                2018-05-04 07:55:10 INFO [jenkins.model.Jenkins cleanUp]   Jenkins stopped
                Listening for transport dt_socket at address: 5005
                Running from: /usr/share/jenkins/jenkins.war
                2018-05-04 07:55:11 INFO [org.eclipse.jetty.util.log.Log initialized]   Logging initialized @525ms to org.eclipse.jetty.util.log.JavaUtilLog
                2018-05-04 07:55:11 INFO [winstone.Logger logInternal]   Beginning extraction from war file
                ...
                2018-05-04 07:55:46 INFO [jenkins.InitReactorRunner$1 onAttained]   Completed initialization
                2018-05-04 07:55:46 INFO [hudson.WebAppMain$3 run]   Jenkins is fully up and running
                2018-05-04 07:58:12 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish]   Unattended-Upgrades/ACME2 Worker #273 completed: SUCCESS
                2018-05-04 07:58:35 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish]   Unattended-Upgrades/ACME #182 completed: SUCCESS
                2018-05-04 08:01:11 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish]   Unattended-Upgrades/ACME3 #181 completed: SUCCESS
                
              • Sam Van Oort Should I file a dedicated issue for that ("Error waiting for Pipeline to suspend: CpsFlowExecution")?
            2. And the build executor status does not stop showing the pipeline as being in-progress or so:
              • Cancelling/aborting it with this 'x' button/link finally removes it (after confirming in the pop-up dialog "Are you sure you want to abort null?")
            Show
            reinholdfuereder Reinhold Füreder added a comment - - edited More or less accidentally I just successfully resumed 3 pipelines after cancelling the shutdown mode after restarting Jenkins after putting Jenkins in shutdown mode (cf. also JENKINS-38316 ): and I did NOT have to wake them up manually by the "pause"-"resume" workaround maybe/presumably because I entered/started the shutdown mode in the middle of 'sh' steps, AND then waited until the end of this steps before restarting Jenkins? However, the following minor issues popped up – please mind that there were actually two Jenkins restarts, because Jenkins Plugins were updated (just one in fact) after the first Jenkins restart via Jenkins init.d hook scripts, followed by a second restart (after the updates): Resuming after Jenkins restart is slow ... Resuming build at Fri May 04 07:52:09 CEST 2018 after Jenkins restart Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ??? Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Resuming build at Fri May 04 07:55:41 CEST 2018 after Jenkins restart Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ??? Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Ready to run at Fri May 04 07:57:42 CEST 2018 [Pipeline] sh 07:57:42 [ACME] Running shell script ... ... maybe because of running into timeouts when suspending (the more or less still suspended pipelines; because first action in init.d hook scripts is setting Jenkins in shutdown mode)!? 2018-05-04 07:52:09 INFO [hudson.WebAppMain$3 run] Jenkins is fully up and running 2018-05-04 07:52:10 SEVERE [jenkins.model.Jenkins$24 run] Restarting VM as requested by SYSTEM 2018-05-04 07:52:10 INFO [jenkins.model.Jenkins cleanUp] Stopping Jenkins 2018-05-04 07:52:10 INFO [jenkins.model.Jenkins$19 onAttained] Started termination 2018-05-04 07:52:10 WARNING [hudson.util.ExceptionCatchingThreadFactory uncaughtException] Thread Computer.threadPoolForRemoting [#2] terminated unexpectedly java.nio.channels.ClosedSelectorException at sun.nio.ch.SelectorImpl.keys(SelectorImpl.java:68) at org.jenkinsci.remoting.protocol.IOHub.getThreadNameBase(IOHub.java:426) at org.jenkinsci.remoting.protocol.IOHub.access$200(IOHub.java:69) at org.jenkinsci.remoting.protocol.IOHub$IOHubSelectorWatcher.run(IOHub.java:536) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-05-04 07:53:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME2/273:Unattended-Upgrades/ACME2 #273]] java.util.concurrent.TimeoutException: Timeout waiting for task. at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:54:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME/182:Unattended-Upgrades/ACME #182]] java.util.concurrent.TimeoutException: Timeout waiting for task. at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:55:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME3/181:Unattended-Upgrades/ACME3 #181]] java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:258) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins$19 onAttained] Completed termination 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpDisconnectComputers] Starting node disconnection 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpShutdownPluginManager] Stopping plugin manager 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpPersistQueue] Persisting build queue 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpAwaitDisconnects] Waiting for node disconnection completion 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins cleanUp] Jenkins stopped Listening for transport dt_socket at address: 5005 Running from: /usr/share/jenkins/jenkins.war 2018-05-04 07:55:11 INFO [org.eclipse.jetty.util.log.Log initialized] Logging initialized @525ms to org.eclipse.jetty.util.log.JavaUtilLog 2018-05-04 07:55:11 INFO [winstone.Logger logInternal] Beginning extraction from war file ... 2018-05-04 07:55:46 INFO [jenkins.InitReactorRunner$1 onAttained] Completed initialization 2018-05-04 07:55:46 INFO [hudson.WebAppMain$3 run] Jenkins is fully up and running 2018-05-04 07:58:12 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME2 Worker #273 completed: SUCCESS 2018-05-04 07:58:35 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME #182 completed: SUCCESS 2018-05-04 08:01:11 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME3 #181 completed: SUCCESS Sam Van Oort Should I file a dedicated issue for that (" Error waiting for Pipeline to suspend: CpsFlowExecution ")? And the build executor status does not stop showing the pipeline as being in-progress or so: Cancelling/aborting it with this 'x' button/link finally removes it (after confirming in the pop-up dialog " Are you sure you want to abort null? ")
            Hide
            svanoort Sam Van Oort added a comment -

            Reinhold Füreder I would open a separate issue for Timeouts suspending executions – especially if you can come up with a consistent way to reproduce it. I saw it from time to time with Pipelines doing very complex processing (where we can't block the shutdown forever and shouldn't).

            My suspicion is that there's a subtle bug around the halt-at-shutdown logic, which may have been pre-existing but is visible now because the process is more closely monitored and logged now (also because we actually have some test coverage for it). Unfortunately

            By the way, you will sometimes be able to resume Pipelines after going into prepare-for-shutdown if the toggle happens at the right time – but in general there's no wakeup hook to resume execution (see notes above about how we plan to add one).

            Show
            svanoort Sam Van Oort added a comment - Reinhold Füreder I would open a separate issue for Timeouts suspending executions – especially if you can come up with a consistent way to reproduce it. I saw it from time to time with Pipelines doing very complex processing (where we can't block the shutdown forever and shouldn't). My suspicion is that there's a subtle bug around the halt-at-shutdown logic, which may have been pre-existing but is visible now because the process is more closely monitored and logged now (also because we actually have some test coverage for it). Unfortunately By the way, you will sometimes be able to resume Pipelines after going into prepare-for-shutdown if the toggle happens at the right time – but in general there's no wakeup hook to resume execution (see notes above about how we plan to add one).
            reinholdfuereder Reinhold Füreder made changes -
            Link This issue is related to JENKINS-51215 [ JENKINS-51215 ]
            Hide
            reinholdfuereder Reinhold Füreder added a comment -
            Show
            reinholdfuereder Reinhold Füreder added a comment - Sam Van Oort OK => JENKINS-51215
            Hide
            tsniatowski Tomasz Śniatowski added a comment -

            So with pipelines, what is the recommended way of completely stopping a busy Jenkins instance for maintenance? The maintenance is in part due to a broken pipeline resume a'la JENKINS-50199, so I specifically don't want any additional half-done pipelines waiting to be resumed. I also would prefer to avoid having to abort jobs.

            In JENKINS-38316 there's an explicit mention that "prepare for shutdown" is not that:

            The whole idea of "Prepare for shutdown" is to [...] allow you to finish currently running freestyle (Maven, matrix, …) builds. So if you /safeRestart Jenkins will restart as soon as any of those are completed, and running Pipeline builds will be left alone.

            What should I do then?

            Show
            tsniatowski Tomasz Śniatowski added a comment - So with pipelines, what is the recommended way of completely stopping a busy Jenkins instance for maintenance? The maintenance is in part due to a broken pipeline resume a'la JENKINS-50199 , so I specifically don't want any additional half-done pipelines waiting to be resumed. I also would prefer to avoid having to abort jobs. In JENKINS-38316 there's an explicit mention that "prepare for shutdown" is not that: The whole idea of "Prepare for shutdown" is to [...] allow you to finish currently running freestyle (Maven, matrix, …) builds. So if you /safeRestart Jenkins will restart as soon as any of those are completed, and running Pipeline builds will be left alone. What should I do then?
            Hide
            hentis Henti Smith added a comment -

            We're having similar issues. We use pipeline extensively to build on different platforms and types of slaves and we're also seeing the pipeline jobs finish but not remove from slave. 

            Restarting jenkins, which is usually the reason for shutdown, gets the jobs even more out of shape as the pipeline job reconnects to the slave, then tries to continue on the slave, but cannot as it's waiting for executor on the slave it's running on. 

             

             21:13:21 Running on ella in /home/jenkins/slave/workspace/Security/SAMATE/SAMATE-java
            -- stuff happens here -- 
            -- put jeckins in shutdown mode -- 
            Waiting to resume part of Security » SALADE » SALADE-java #490: Jenkins is about to shut down
            -- Restart jenkins -- 
            Resuming build at Tue Aug 21 07:50:31 BST 2018 after Jenkins restart
            Waiting to resume part of Security » SALADE » SALADE-java #490: Waiting for next available executor on ella
            
            My expectation is the pipeline job on that node would finish and the next pipeline job will be queued unassigned to a node to allow restart and connecting to a new node ? 
            Show
            hentis Henti Smith added a comment - We're having similar issues. We use pipeline extensively to build on different platforms and types of slaves and we're also seeing the pipeline jobs finish but not remove from slave.  Restarting jenkins, which is usually the reason for shutdown, gets the jobs even more out of shape as the pipeline job reconnects to the slave, then tries to continue on the slave, but cannot as it's waiting for executor on the slave it's running on.    21:13:21 Running on ella in /home/jenkins/slave/workspace/Security/SAMATE/SAMATE-java -- stuff happens here -- -- put jeckins in shutdown mode -- Waiting to resume part of Security » SALADE » SALADE-java #490: Jenkins is about to shut down -- Restart jenkins -- Resuming build at Tue Aug 21 07:50:31 BST 2018 after Jenkins restart Waiting to resume part of Security » SALADE » SALADE-java #490: Waiting for next available executor on ella My expectation is the pipeline job on that node would finish and the next pipeline job will be queued unassigned to a node to allow restart and connecting to a new node ?
            svanoort Sam Van Oort made changes -
            Assignee Sam Van Oort [ svanoort ] Jose Blas Camacho Taboada [ jtaboada ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell That specific case sounds a lot like a gremlin we've been chasing on and off for quite a while. I'm assigning this to Jose Blas Camacho Taboada to investigate.

            I think what you report may be independent of what was discussed here though which is probably the root cause of the issue: https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=336282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-336282 through https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332080

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell That specific case sounds a lot like a gremlin we've been chasing on and off for quite a while. I'm assigning this to Jose Blas Camacho Taboada to investigate. I think what you report may be independent of what was discussed here though which is probably the root cause of the issue: https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=336282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-336282 through https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332080
            Hide
            jeubank Josiah Eubank added a comment -

            Also experiencing this new since around 2.140 in all pipeline jobs.  These jobs are on lowest durability setting.

             

            Previously had no issue holding the queue by "preparing for shutdown" and the currently running jobs would finish.  Now have to force Jenkins to restart to get rid of the jobs.

            Show
            jeubank Josiah Eubank added a comment - Also experiencing this new since around 2.140 in all pipeline jobs.  These jobs are on lowest durability setting.   Previously had no issue holding the queue by "preparing for shutdown" and the currently running jobs would finish.  Now have to force Jenkins to restart to get rid of the jobs.
            jtaboada Jose Blas Camacho Taboada made changes -
            Assignee Jose Blas Camacho Taboada [ jtaboada ]
            vivek Vivek Pandey made changes -
            Labels 1.651.1 2.0 2.0-rc lts testfest 1.651.1 2.0 2.0-rc lts testfest triaged-2018-11
            Hide
            narenji Ali Narenji added a comment -

            We have the same problem on Jenkins 2.138.2.

            Is there any time estimation for resolving the issue?

            Show
            narenji Ali Narenji added a comment - We have the same problem on Jenkins 2.138.2. Is there any time estimation for resolving the issue?
            Hide
            hgholami Hamid Gholami added a comment -

            Any update?

            We have same issue on Jenkins.

            Show
            hgholami Hamid Gholami added a comment - Any update? We have same issue on Jenkins.
            Hide
            tknerr Torben Knerr added a comment - - edited

            Same issue here with Jenkins LTS 2.150.2

            I'm seeing this with pipeline durability set to "PERFORMANCE_OPTIMIZED" in the global configuration.

            Sam Van Oort re-reading your comment here https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-332080 I am wondering why the currently executing pipeline should actually halt by design – wouldn't it be more intuitive if any running pipelines just complete (as it was the case with freestyle jobs earlier)

            Show
            tknerr Torben Knerr added a comment - - edited Same issue here with Jenkins LTS 2.150.2 I'm seeing this with pipeline durability set to "PERFORMANCE_OPTIMIZED" in the global configuration. Sam Van Oort re-reading your comment here https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-332080 I am wondering why the currently executing pipeline should actually halt by design – wouldn't it be more intuitive if any running pipelines just complete (as it was the case with freestyle jobs earlier)
            Hide
            laszlog Laszlo Gaal added a comment -

            We are also seeing this on LTS 2.150.2 quite regularly.

            Show
            laszlog Laszlo Gaal added a comment - We are also seeing this on LTS 2.150.2 quite regularly.
            Hide
            awkspace awk space added a comment -

            Having the same issue here on Jenkins 2.165 - even with simple 'sh "sleep 60"' test jobs.

            Attempting to work around the issue by checking "Do not allow the pipeline to resume after master restarts" and changing the pipeline to PERFORMANCE_OPTIMIZED makes the pipeline attempt to resume after restart (??) and makes me run into JENKINS-50407 instead.

            Show
            awkspace awk space added a comment - Having the same issue here on Jenkins 2.165 - even with simple 'sh "sleep 60"' test jobs. Attempting to work around the issue by checking "Do not allow the pipeline to resume after master restarts" and changing the pipeline to PERFORMANCE_OPTIMIZED makes the pipeline attempt to resume after restart (??) and makes me run into  JENKINS-50407 instead.
            Hide
            buuren Vladimir K added a comment -

            According to my observations, the bug only affects Jenkins pipelines and happens when a Jenkins is put into shutdown mode when there are some pipelines running on background; those pipelines will not be able to proceed to next pipeline stage(s) and will indefinitely stuck in whatever last stage there was prior shutdown mode.

            This can be reproduced with the following pseudo-pipeline:

            stages {
                stage('build') {
                    steps {
                        sh('make build')
                    }
                }
            
                stage('prepare') {
                    steps {
                        //During this stage, Jenkins is put into shutdown mode
                        sh('make prepare-for-restart')
                    }
                }
            
                stage('deploy') {
                    steps {
                        // Pipeline will not reach this stage
                        sh('make deploy')
                    }
                }
            }
            
            post {
                always {
                    sh('echo Test')
                }
            }

            The following pipeline will never reach neither deploy or post stages

            My guess shutdown prevents any new build threads to be executed and since each stage runs in separate thread (for serialization purposes), pipelines get stuck. This behavior seems to be intended because this allows Jenkins to continue stages after hard restart.

            In my use-case I would like to conduct a safe, controlled Jenkins restart, allowing any existing workloads to finish. 

            Show
            buuren Vladimir K added a comment - According to my observations, the bug only affects Jenkins pipelines and happens when a Jenkins is put into shutdown mode when there are some pipelines running on background; those pipelines will not be able to proceed to next pipeline stage(s) and will indefinitely stuck in whatever last stage there was prior shutdown mode. This can be reproduced with the following pseudo-pipeline: stages { stage( 'build' ) { steps { sh( 'make build' ) } } stage( 'prepare' ) { steps { //During this stage, Jenkins is put into shutdown mode sh( 'make prepare- for -restart' ) } } stage( 'deploy' ) { steps { // Pipeline will not reach this stage sh( 'make deploy' ) } } } post { always { sh( 'echo Test' ) } } The following pipeline will never reach neither deploy  or post stages My guess shutdown prevents any new build threads to be executed and since each stage runs in separate thread (for serialization purposes), pipelines get stuck. This behavior seems to be intended because this allows Jenkins to continue stages after hard restart. In my use-case I would like to conduct a safe, controlled Jenkins restart, allowing any existing workloads to finish. 
            Hide
            fr0 Chris Frolik added a comment -

            It is distressing that major issues like this sit for 3+ years with no resolution – and worse, not assigned to anyone.

            Since pipeline jobs are "the norm" now, the "Prepare For Shutdown" button is a trap for users to get their system into a broken state. If this issue cannot be fixed, at the very least there should be a warning label next to that button.

            Show
            fr0 Chris Frolik added a comment - It is distressing that major issues like this sit for 3+ years with no resolution – and worse, not assigned to anyone. Since pipeline jobs are "the norm" now, the "Prepare For Shutdown" button is a trap for users to get their system into a broken state. If this issue cannot be fixed, at the very least there should be a warning label next to that button.
            Hide
            dnusbaum Devin Nusbaum added a comment - - edited

            I am looking into this issue in PR 340. I am a little confused by some of the comments in this thread. As far as I can tell, this has never worked, regardless of what step is executing when quiet mode is enabled, because there is no code to tell Pipeline executions that quiet mode was cancelled and they should try to resume themselves. Maybe I am misunderstanding something or there are multiple distinct issues being discussed in the ticket, so I am going to reread all of the comments in case and do some additional testing with the sh step.

            Show
            dnusbaum Devin Nusbaum added a comment - - edited I am looking into this issue in PR 340 . I am a little confused by some of the comments in this thread. As far as I can tell, this has never worked, regardless of what step is executing when quiet mode is enabled, because there is no code to tell Pipeline executions that quiet mode was cancelled and they should try to resume themselves. Maybe I am misunderstanding something or there are multiple distinct issues being discussed in the ticket, so I am going to reread all of the comments in case and do some additional testing with the sh step.
            dnusbaum Devin Nusbaum made changes -
            Remote Link This issue links to "jenkinsci/workflow-cps-plugin#340 (Web Link)" [ 24032 ]
            dnusbaum Devin Nusbaum made changes -
            Assignee Devin Nusbaum [ dnusbaum ]
            dnusbaum Devin Nusbaum made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            ferulee46 Ferruccio Bongianni added a comment - - edited

            I am experiencing the same problem as described in the title. This is how I reproduce it:

            • Run a container from jenkins/jenkins:lts image (which for me has version Jenkins ver. 2.190.2)
            • create a simple pipeline:
              • pipeline {
                    agent any
                    stages {
                        stage('x') {
                            steps {
                                sh 'sleep 30'
                                sh 'sleep 30'
                            }
                        }
                    }
                } 
            • run a build of the pipeline above, then go to Manage Jenkins and click on Prepare for shutdown

             

            At this point Jenkins shows the red stripe Jenkins is going to shut down but it never does. The pipeline never proceed (in fact, it doesn't even reach the second sh if the prepare for shutdown happened during the first sleep. The pipeline hangs, and never terminate.

            I am currently at Jenkins world and showed the behaviour to Liam Newman today.

            Show
            ferulee46 Ferruccio Bongianni added a comment - - edited I am experiencing the same problem as described in the title. This is how I reproduce it: Run a container from jenkins/jenkins:lts image (which for me has version  Jenkins ver. 2.190.2 ) create a simple pipeline: pipeline { agent any stages { stage( 'x' ) { steps { sh 'sleep 30' sh 'sleep 30' } } } } run a build of the pipeline above, then go to Manage Jenkins and click on Prepare for shutdown   At this point Jenkins shows the red stripe Jenkins is going to shut down but it never does. The pipeline never proceed (in fact, it doesn't even reach the second sh if the prepare for shutdown happened during the first sleep. The pipeline hangs, and never terminate. I am currently at Jenkins world and showed the behaviour to Liam Newman  today.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Ferruccio Bongianni Yes, it's confusing, but that's the intended behavior. Clicking "Prepare for shutdown" pauses all running Pipelines. My PR prints a message to the build log of Pipelines when this happens to make it clear that the build is paused. Once the build is paused, Jenkins can be restarted, and the Pipeline will resume after the restart.

            If you are only using "Prepare for shutdown" to restart Jenkins without breaking in-progress builds of non-Pipeline jobs, you can navigate to the /safeRestart URL, which is like "Prepare for shutdown" except that it automatically restarts after all non-Pipeline jobs complete.

            The Pipeline builds should resume after Jenkins restarts with or without my PR. The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically.

            Show
            dnusbaum Devin Nusbaum added a comment - Ferruccio Bongianni Yes, it's confusing, but that's the intended behavior. Clicking "Prepare for shutdown" pauses all running Pipelines. My PR prints a message to the build log of Pipelines when this happens to make it clear that the build is paused. Once the build is paused, Jenkins can be restarted, and the Pipeline will resume after the restart. If you are only using "Prepare for shutdown" to restart Jenkins without breaking in-progress builds of non-Pipeline jobs, you can navigate to the /safeRestart URL, which is like "Prepare for shutdown" except that it automatically restarts after all non-Pipeline jobs complete. The Pipeline builds should resume after Jenkins restarts with or without my PR. The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically.
            Hide
            ferulee46 Ferruccio Bongianni added a comment -

            Hi Devin, thanks for your prompt reply.

            It is very confusing indeed. Shouldn't the message in red saying something different than 'Jenkins is going to shut down' if that is not true (because it actually does not shut down at all, ever).

            Also, I haven't tried it right now, but I'm fairly confident this is exactly what happens when you update a plugin that needs Jenkins to be restarted. if you've got pipelines running it will never do until you kill those pipelines.

             

            Show
            ferulee46 Ferruccio Bongianni added a comment - Hi Devin, thanks for your prompt reply. It is very confusing indeed. Shouldn't the message in red saying something different than 'Jenkins is going to shut down' if that is not true (because it actually does not shut down at all, ever). Also, I haven't tried it right now, but I'm fairly confident this is exactly what happens when you update a plugin that needs Jenkins to be restarted. if you've got pipelines running it will never do until you kill those pipelines.  
            Hide
            varju Alex Varju added a comment -

            In the modern world of transient agents (e.g. Kubernetes pods) that won't exist after restart, this approach of pausing the pipeline is painful.  It would sure be nice if there was a way to allow current jobs to finish while not allowing queued jobs to be started.

            Show
            varju Alex Varju added a comment - In the modern world of transient agents (e.g. Kubernetes pods) that won't exist after restart, this approach of pausing the pipeline is painful.  It would sure be nice if there was a way to allow current jobs to finish while not allowing queued jobs to be started.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Ferruccio Bongianni I think the intention of the message is something like how you might use wall on a multi-user Unix system, in the sense that the message is a way for admins to signify to anyone that might be using Jenkins that it will be shut down at some point (just guessing, I did not add the feature). The admin still has to actually initiate the shutdown themselves, so for admins, I agree, the message is confusing.

            Show
            dnusbaum Devin Nusbaum added a comment - Ferruccio Bongianni I think the intention of the message is something like how you might use wall on a multi-user Unix system, in the sense that the message is a way for admins to signify to anyone that might be using Jenkins that it will be shut down at some point (just guessing, I did not add the feature). The admin still has to actually initiate the shutdown themselves, so for admins, I agree, the message is confusing.
            Hide
            carltongbrown Carlton Brown added a comment -

            4 years old serious usability issue, hasn't been fixed, won't be fixed because nobody cares.   Jenkins is dead, use something else.

            Show
            carltongbrown Carlton Brown added a comment - 4 years old serious usability issue, hasn't been fixed, won't be fixed because nobody cares.   Jenkins is dead, use something else.
            Hide
            ftclausen Friedrich Clausen added a comment -

            As mentioned above, in the modern world of transient agents such as Kubernetes pods, this is quite painful. Transient agents are likely to become more prevelant. 

            Show
            ftclausen Friedrich Clausen added a comment - As mentioned above, in the modern world of transient agents such as Kubernetes pods, this is quite painful. Transient agents are likely to become more prevelant. 
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Devin Nusbaum Thanks, I think that ("The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically.") should actually really address one of my (many months ago) experienced problems in this concern! (Because I have a groovy init hook script that always configures Jenkins to start in so-called quiet mode...)

            As other users more or less diplomatically commented, there is still room for related important enhancements: maybe these can be collected and discussed and prioritised and addressed in another future sprint/story? (And I think Jenkins is still in massive use nowadays and hopefully not dead for a long time...)

            Show
            reinholdfuereder Reinhold Füreder added a comment - Devin Nusbaum Thanks, I think that (" The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically. ") should actually really address one of my (many months ago) experienced problems in this concern! (Because I have a groovy init hook script that always configures Jenkins to start in so-called quiet mode...) As other users more or less diplomatically commented, there is still room for related important enhancements: maybe these can be collected and discussed and prioritised and addressed in another future sprint/story? (And I think Jenkins is still in massive use nowadays and hopefully not dead for a long time...)
            Hide
            dnusbaum Devin Nusbaum added a comment - - edited

            A fix for this issue was just released in Pipeline: Groovy Plugin version 2.78. I think there is/was some confusion as to the expected behavior (myself included!), so let me try to clarify: When Jenkins prepares for shutdown, all running Pipelines are paused, and this is the intended behavior. The unintended behavior was that if you canceled shutdown, Pipelines remained paused. This has been fixed in 2.78; Pipelines will now resume execution if shutdown is canceled. Before 2.78, you had to manually pause and unpause each Pipeline to get it to resume execution, or restart Jenkins. Additionally, preparing Jenkins for shutdown and canceling shutdown now each cause a message to be printed to Pipeline build logs indicating that the Pipeline is being paused or resumed due to shutdown so that it is easier to understand what is happening.

            Based on comments here and elsewhere, I think some users would prefer a variant of "Prepare for shutdown" in which Pipelines continue executing to completion, the same as other types of jobs like Freestyle. If that is something you want, please open a new ticket, describing your use case and the desired behavior.

            For anyone curious as to why Pipelines are paused when Jenkins prepares for shutdown, instead of continuing to execute and only saving at the last possible second when Jenkins is stopped, the reasoning is to avoid race conditions saving Pipeline metadata that could prevent Pipelines from resuming correctly.

            If there is some other aspect of this issue that you would like to see addressed, or a different behavior you would prefer, please open a new ticket describing your particular use case.

            Thanks!
             

            Show
            dnusbaum Devin Nusbaum added a comment - - edited A fix for this issue was just released in Pipeline: Groovy Plugin version 2.78. I think there is/was some confusion as to the expected behavior (myself included!), so let me try to clarify: When Jenkins prepares for shutdown, all running Pipelines are paused, and this is the intended behavior. The unintended behavior was that if you canceled shutdown, Pipelines remained paused. This has been fixed in 2.78; Pipelines will now resume execution if shutdown is canceled. Before 2.78, you had to manually pause and unpause each Pipeline to get it to resume execution, or restart Jenkins. Additionally, preparing Jenkins for shutdown and canceling shutdown now each cause a message to be printed to Pipeline build logs indicating that the Pipeline is being paused or resumed due to shutdown so that it is easier to understand what is happening. Based on comments here and elsewhere, I think some users would prefer a variant of "Prepare for shutdown" in which Pipelines continue executing to completion, the same as other types of jobs like Freestyle. If that is something you want, please open a new ticket, describing your use case and the desired behavior. For anyone curious as to why Pipelines are paused when Jenkins prepares for shutdown, instead of continuing to execute and only saving at the last possible second when Jenkins is stopped, the reasoning is to avoid race conditions saving Pipeline metadata that could prevent Pipelines from resuming correctly. If there is some other aspect of this issue that you would like to see addressed, or a different behavior you would prefer, please open a new ticket describing your particular use case. Thanks!  
            dnusbaum Devin Nusbaum made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Released As workflow-cps 2.78
            reinholdfuereder Reinhold Füreder made changes -
            Link This issue is related to JENKINS-60434 [ JENKINS-60434 ]
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Thanks again Devin Nusbaum! And following your advice => JENKINS-60434

            Show
            reinholdfuereder Reinhold Füreder added a comment - Thanks again Devin Nusbaum ! And following your advice => JENKINS-60434
            Hide
            medianick Nick Jones added a comment -

            Devin Nusbaum could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL, as enabled by https://plugins.jenkins.io/saferestart)? Do running pipeline jobs get paused in those circumstances too and now (with Pipeline: Groovy 2.78) automatically resumed once Jenkins is back up?

            Show
            medianick Nick Jones added a comment - Devin Nusbaum could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL, as enabled by https://plugins.jenkins.io/saferestart)? Do running pipeline jobs get paused in those circumstances too and now (with Pipeline: Groovy 2.78) automatically resumed once Jenkins is back up?
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL

            Both of these situations use the /safeRestart URL behind the scenes, which puts Jenkins into the same state as "Prepare for shutdown", which prevents new builds from being started and causes Pipeline builds to pause. The difference between /safeRestart and "Prepare for shutdown" is that safeRestart will also automatically restart Jenkins once all non-Pipeline jobs have completed and all Pipeline jobs have been paused, whereas "Prepare for shutdown" does not actually restart Jenkins.

            Even before Pipeline: Groovy version 2.78, once Jenkins restarted due to /safeRestart, all Pipelines should have resumed automatically, and they should continue to have that behavior in Pipeline: Groovy 2.78. If your Pipelines are not resuming after the restart, please open a new ticket, including steps to reproduce the issue from scratch and any messages from your Jenkins logs or Pipeline build logs that seem relevant.

            Show
            dnusbaum Devin Nusbaum added a comment - Could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL Both of these situations use the /safeRestart URL behind the scenes, which puts Jenkins into the same state as "Prepare for shutdown", which prevents new builds from being started and causes Pipeline builds to pause. The difference between /safeRestart and "Prepare for shutdown" is that safeRestart will also automatically restart Jenkins once all non-Pipeline jobs have completed and all Pipeline jobs have been paused, whereas "Prepare for shutdown" does not actually restart Jenkins. Even before Pipeline: Groovy version 2.78, once Jenkins restarted due to /safeRestart , all Pipelines should have resumed automatically, and they should continue to have that behavior in Pipeline: Groovy 2.78. If your Pipelines are not resuming after the restart, please open a new ticket, including steps to reproduce the issue from scratch and any messages from your Jenkins logs or Pipeline build logs that seem relevant.
            Hide
            medianick Nick Jones added a comment -

            Thanks Devin Nusbaum. So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart? I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details.

            Show
            medianick Nick Jones added a comment - Thanks Devin Nusbaum . So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart? I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart?

            Yes, although note that you can also cancel /safeRestart before the restart happens, and the logic change fixes that case too.

            I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details.

            Ok, great!

            Show
            dnusbaum Devin Nusbaum added a comment - So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart? Yes, although note that you can also cancel /safeRestart before the restart happens, and the logic change fixes that case too. I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details. Ok, great!
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Devin Nusbaum I can confirm that your fix works really fine!

            Because – now some coughing and red face – I accidentally restarted Jenkins master without waiting for pipelines to complete (of course looking forward to JENKINS-60434): and there were some non-minor real world pipelines running... Just one of them failed due to JENKINS-49365...

            Show
            reinholdfuereder Reinhold Füreder added a comment - Devin Nusbaum I can confirm that your fix works really fine! Because – now some coughing and red face – I accidentally restarted Jenkins master without waiting for pipelines to complete (of course looking forward to JENKINS-60434 ): and there were some non-minor real world pipelines running... Just one of them failed due to JENKINS-49365 ...

              People

              • Assignee:
                dnusbaum Devin Nusbaum
                Reporter:
                svanoort Sam Van Oort
              • Votes:
                44 Vote for this issue
                Watchers:
                69 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: