Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33926

Jenkins no longer appears to shutdown correctly

    Details

    • Similar Issues:

      Description

      Start Jenkins with java -jar jenkins-2.0-beta-1.jar on windows

      In the console hit ctrl+c

      expected behaviour

      Jenkins does a graceful shutdown and this is appropriately logged

      actual behaviour

      it appears as though Jenkins is brutally terminated and does not do a graceful shutdown.
      There are no logs to indicate a graceful termination

      In 1.x I see the following logs
      Mar 31, 2016 12:02:39 PM winstone.Logger logInternal INFO: JVM is terminating. Shutting down Winstone

      in 2.x I do not see those entries - and there is nothing to indicate in the log that Jenkins is shutting down gracefully

        Attachments

          Issue Links

            Activity

            teilo James Nord created issue -
            Hide
            teilo James Nord added a comment - - edited

            whilst this sometimes occurs in 1.x testing (<10% of the time) during my 2.0 testing I get it > 90% of the time in 2.x implying something is worse somewhere.

            Show
            teilo James Nord added a comment - - edited whilst this sometimes occurs in 1.x testing (<10% of the time) during my 2.0 testing I get it > 90% of the time in 2.x implying something is worse somewhere.
            swashbuck1r Spike Washburn made changes -
            Field Original Value New Value
            Labels 2.0 2.0-beta regression 2.0 2.0-beta 2.0-planned community-bee regression
            swashbuck1r Spike Washburn made changes -
            Priority Minor [ 4 ] Critical [ 2 ]
            Hide
            swashbuck1r Spike Washburn added a comment -

            Critical to investigate for RC. If this is only a problem for interactive runs, we could probably live with it for GA, but if this is affecting production instances running as services, that would be really nasty.

            Show
            swashbuck1r Spike Washburn added a comment - Critical to investigate for RC. If this is only a problem for interactive runs, we could probably live with it for GA, but if this is affecting production instances running as services, that would be really nasty.
            Hide
            danielbeck Daniel Beck added a comment -

            Same issue for me on OS X so does not seem limited to Windows.

            If this means that e.g. service restarts don't do a clean restart, and e.g. lose the queue contents, that would be bad.

            If it's just a case of Jetty 9 no longer logging this, we'll survive

            Show
            danielbeck Daniel Beck added a comment - Same issue for me on OS X so does not seem limited to Windows. If this means that e.g. service restarts don't do a clean restart, and e.g. lose the queue contents, that would be bad. If it's just a case of Jetty 9 no longer logging this, we'll survive
            swashbuck1r Spike Washburn made changes -
            Assignee Kristin Whetstone [ kwhetstone ]
            danielbeck Daniel Beck made changes -
            Epic Link JENKINS-33810 [ 169285 ]
            Hide
            kwhetstone Kristin Whetstone added a comment -

            While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Trying to see if there's a problem after "Installing new features" or running it under heavy stress.

            Show
            kwhetstone Kristin Whetstone added a comment - While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Trying to see if there's a problem after "Installing new features" or running it under heavy stress.
            Hide
            danielbeck Daniel Beck added a comment -

            Would be interesting to know whether the shutdown procedure (that includes saving the queue to disk) gets executed every time, log statement or not.

            Show
            danielbeck Daniel Beck added a comment - Would be interesting to know whether the shutdown procedure (that includes saving the queue to disk) gets executed every time, log statement or not.
            Hide
            swashbuck1r Spike Washburn added a comment -

            Right, the investigation needs to prove/disprove if the shutdown sequence is consistently running as expected. Are there some clear signatures to this that Kristin could rely on? Examples: Some expected file update times change, a debug message that a Servlet.destroy got called, etc.

            Show
            swashbuck1r Spike Washburn added a comment - Right, the investigation needs to prove/disprove if the shutdown sequence is consistently running as expected. Are there some clear signatures to this that Kristin could rely on? Examples: Some expected file update times change, a debug message that a Servlet.destroy got called, etc.
            Hide
            danielbeck Daniel Beck added a comment -

            Spike Washburn queue.xml gets written (possibly only when there are items in the queue, not sure).

            Show
            danielbeck Daniel Beck added a comment - Spike Washburn queue.xml gets written (possibly only when there are items in the queue, not sure).
            recena Manuel Recena Soto made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kwhetstone Kristin Whetstone made changes -
            Link This issue is related to JENKINS-30909 [ JENKINS-30909 ]
            Hide
            kwhetstone Kristin Whetstone added a comment - - edited

            I think this (JENKINS-30909) is similar since it's referring to a missing queue log.

            In testing where I physically go through the whole computer shutdown process, it appears that things run properly. (I find a properly timestamped queue.xml.bak file) When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect.

            So far, the behavior I've seen echos this work item. I'll have to look to a different mechanism (James Nord brought up TermMilestone) to see if there's another issue.

            Show
            kwhetstone Kristin Whetstone added a comment - - edited I think this ( JENKINS-30909 ) is similar since it's referring to a missing queue log. In testing where I physically go through the whole computer shutdown process, it appears that things run properly. (I find a properly timestamped queue.xml.bak file) When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect. So far, the behavior I've seen echos this work item. I'll have to look to a different mechanism (James Nord brought up TermMilestone) to see if there's another issue.
            Hide
            kwhetstone Kristin Whetstone added a comment - - edited

            After installing Jenkins 2.0 as a Windows service and running a heavy mock load, I'm still seeing the queue.xml file appear. Since the service is able to start up earlier than I can access my file system, I know this file is created because of queue.xml.bak which isn't removed & contains the same information.

            The logs left in jenkins.out.log and jenkins.err.log don't have any messages for shutting down, but I have a feeling they're overwritten on service restart, so they're likely not useful. Checking the log on the Jenkins instance also only covers the current run.

            I linked to JENKINS-30909 to highlight the behavior I'm seeing as part of restarting/destroying-recreating the service. Nothing really appears to be out of the ordinary.

            From what I can tell during these tests, I don't think this is a blocker issue for 2.0 release.

            Show
            kwhetstone Kristin Whetstone added a comment - - edited After installing Jenkins 2.0 as a Windows service and running a heavy mock load, I'm still seeing the queue.xml file appear. Since the service is able to start up earlier than I can access my file system, I know this file is created because of queue.xml.bak which isn't removed & contains the same information. The logs left in jenkins.out.log and jenkins.err.log don't have any messages for shutting down, but I have a feeling they're overwritten on service restart, so they're likely not useful. Checking the log on the Jenkins instance also only covers the current run. I linked to JENKINS-30909 to highlight the behavior I'm seeing as part of restarting/destroying-recreating the service. Nothing really appears to be out of the ordinary. From what I can tell during these tests, I don't think this is a blocker issue for 2.0 release.
            Hide
            danielbeck Daniel Beck added a comment -

            When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect.

            Would be interesting to know whether a Linux service behaves differently in shutting down. Not sure which signals each sends. Loss of the queue on every shutdown would be a regression.

            Show
            danielbeck Daniel Beck added a comment - When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect. Would be interesting to know whether a Linux service behaves differently in shutting down. Not sure which signals each sends. Loss of the queue on every shutdown would be a regression.
            Hide
            teilo James Nord added a comment - - edited

            Kristin Whetstoneno need to restart the OS, just start stop the service either via the SCM or sc stop jenkins so you always have access to the filesystem

            Show
            teilo James Nord added a comment - - edited Kristin Whetstone no need to restart the OS, just start stop the service either via the SCM or sc stop jenkins so you always have access to the filesystem
            Hide
            kwhetstone Kristin Whetstone added a comment -

            So I ran through a number of scenarios on Windows and Linux on 2.0 and 1.X and it appears that for Windows we never actually save the queue when we stop the service. Linux is fine; I started and stopped the service while every executor is used, and everything is saved and restarted. I think this might be a difference in the command signals sent to the service when stopping in Windows vs Linux, but that's just a theory at this point.

            As to other shutdown actions taken when the system is shutting down, since I'm not able to replicate the initial condition, all of my exit conditions (TermMilestones, etc) are hit. Other than this queue conditional, I don't see anything odd with the service.

            Show
            kwhetstone Kristin Whetstone added a comment - So I ran through a number of scenarios on Windows and Linux on 2.0 and 1.X and it appears that for Windows we never actually save the queue when we stop the service. Linux is fine; I started and stopped the service while every executor is used, and everything is saved and restarted. I think this might be a difference in the command signals sent to the service when stopping in Windows vs Linux, but that's just a theory at this point. As to other shutdown actions taken when the system is shutting down, since I'm not able to replicate the initial condition, all of my exit conditions (TermMilestones, etc) are hit. Other than this queue conditional, I don't see anything odd with the service.
            Hide
            swashbuck1r Spike Washburn added a comment -

            Summary of test results from Kristin:

            • Does the message get printed on shutdown? (Windows: 100% appears consistently, Linux: 100% appears consistently)
            • Does the shutdown sequence seem to complete?
            • Marker:queue.xml gets saved (yes on Linux, yes on Windows cmdline, no on Windows service). No on windows service seems consistent with 1.x line of Jenkins.
            • Marker:plugin shutdown() is being called (yes on Linux and Windows)

            Decision [Daniel, Kristin]: At this time, there's no indication of a regression on beta2 vs 1.x, so this is not a stop-ship

            Show
            swashbuck1r Spike Washburn added a comment - Summary of test results from Kristin: Does the message get printed on shutdown? (Windows: 100% appears consistently, Linux: 100% appears consistently) Does the shutdown sequence seem to complete? Marker:queue.xml gets saved (yes on Linux, yes on Windows cmdline, no on Windows service). No on windows service seems consistent with 1.x line of Jenkins. Marker:plugin shutdown() is being called (yes on Linux and Windows) Decision [Daniel, Kristin] : At this time, there's no indication of a regression on beta2 vs 1.x, so this is not a stop-ship
            swashbuck1r Spike Washburn made changes -
            Labels 2.0 2.0-beta 2.0-planned community-bee regression 2.0 2.0-beta community-bee regression
            Hide
            swashbuck1r Spike Washburn added a comment -

            Next steps:

            • Daniel will add testcase for this for the RC and future LTSs.
            • At this time, this issues appears to be not reproducible...Daniel will close.
            • Kristin will open another ticket about the problem where queue.xml doesn't get saved via Windows Service shutdown.
            Show
            swashbuck1r Spike Washburn added a comment - Next steps: Daniel will add testcase for this for the RC and future LTSs. At this time, this issues appears to be not reproducible...Daniel will close. Kristin will open another ticket about the problem where queue.xml doesn't get saved via Windows Service shutdown.
            Hide
            danielbeck Daniel Beck added a comment -

            Resolving as Cannot Reproduce for now. I'll add test cases to RC/LTS testing plans so we make sure to check this for the 2.0 RC.

            Show
            danielbeck Daniel Beck added a comment - Resolving as Cannot Reproduce for now. I'll add test cases to RC/LTS testing plans so we make sure to check this for the 2.0 RC.
            danielbeck Daniel Beck made changes -
            Status In Progress [ 3 ] Resolved [ 5 ]
            Resolution Cannot Reproduce [ 5 ]
            Hide
            teilo James Nord added a comment -

            While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus.

            Seems to imply it is reproduceable from the command line?

            Anyway I can reproduce it so you can always assign it to me

            Show
            teilo James Nord added a comment - While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Seems to imply it is reproduceable from the command line? Anyway I can reproduce it so you can always assign it to me
            Hide
            svanoort Sam Van Oort added a comment -

            100% reproducible with the 2.0-rc WAR running on Mac.

            Test case:

            • Create a pipeline job

            node

            { echo 'stuffs' sleep 100 stage 'Stage 2' }
            • Create a freestyle job, set to allow concurrent builds, with the following shell step:
              echo 'I do stuff'
              sleep 100
            • Start up several builds of the pipeline job, to consume executors.
            • Queue up the freestyle project
            • Shutdown jenkins via Ctrl+C

            Result:

            • Command line reports winstone/jetty shutting down
            • queue.xml & queue.xml.bak files are created but contain no information about the queued build or builds. Pipelines that were running will restart however.
            • if Jenkins is started again, build will not run from queue. Indeed it is as if the freestyle project was never queued: new builds will use its build #, no record it ever existed.
            Show
            svanoort Sam Van Oort added a comment - 100% reproducible with the 2.0-rc WAR running on Mac. Test case: Create a pipeline job node { echo 'stuffs' sleep 100 stage 'Stage 2' } Create a freestyle job, set to allow concurrent builds, with the following shell step: echo 'I do stuff' sleep 100 Start up several builds of the pipeline job, to consume executors. Queue up the freestyle project Shutdown jenkins via Ctrl+C Result: Command line reports winstone/jetty shutting down queue.xml & queue.xml.bak files are created but contain no information about the queued build or builds. Pipelines that were running will restart however. if Jenkins is started again, build will not run from queue. Indeed it is as if the freestyle project was never queued: new builds will use its build #, no record it ever existed.
            Hide
            svanoort Sam Van Oort added a comment -

            Reopening because it exists in a painful and reproducible way (see previous comments); the entire build queue is lost on shutdown.

            Show
            svanoort Sam Van Oort added a comment - Reopening because it exists in a painful and reproducible way (see previous comments); the entire build queue is lost on shutdown.
            svanoort Sam Van Oort made changes -
            Resolution Cannot Reproduce [ 5 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            svanoort Sam Van Oort made changes -
            Labels 2.0 2.0-beta community-bee regression 2.0 2.0-beta 2.0-rc community-bee regression testfest
            kwhetstone Kristin Whetstone made changes -
            Assignee Kristin Whetstone [ kwhetstone ] James Nord [ teilo ]
            Hide
            kwhetstone Kristin Whetstone added a comment -

            James, I'm sending it over to you since I couldn't reproduce it on beta while Ctrl-C or while stopping the service. I'm going to see if I can reproduce it on the RC, but I must not be going through the correct steps or something.

            Note: the part about losing the job queue when restarted as a Windows service looks like an older problem. I definitely think that this should be a bug if it's not already.

            Show
            kwhetstone Kristin Whetstone added a comment - James, I'm sending it over to you since I couldn't reproduce it on beta while Ctrl-C or while stopping the service. I'm going to see if I can reproduce it on the RC, but I must not be going through the correct steps or something. Note: the part about losing the job queue when restarted as a Windows service looks like an older problem. I definitely think that this should be a bug if it's not already.
            Hide
            kwhetstone Kristin Whetstone added a comment -

            Sam, it sounds like your defect is something different than what the original problem covered by this work item. While the queue is saved, it's not saving everything. That might be a bug in the queue.save(). In the Windows version it's not getting saved at all during Ctrl-C. We should open a ticket for that problem so it could be fixed.

            Show
            kwhetstone Kristin Whetstone added a comment - Sam, it sounds like your defect is something different than what the original problem covered by this work item. While the queue is saved, it's not saving everything. That might be a bug in the queue.save(). In the Windows version it's not getting saved at all during Ctrl-C. We should open a ticket for that problem so it could be fixed.
            Hide
            svanoort Sam Van Oort added a comment -

            Yeah, I can't tell for sure if it's the same issue, something similar, or these are all different manifestations of the same underlying cause.

            The previous issue with losing the build queue apparently had some sort of handling with nextBuildNumber that resolved it (I think?). This one seems to be some combination of shutdown logging & queue writing, which could also be two separate issues as well.

            Show
            svanoort Sam Van Oort added a comment - Yeah, I can't tell for sure if it's the same issue, something similar, or these are all different manifestations of the same underlying cause. The previous issue with losing the build queue apparently had some sort of handling with nextBuildNumber that resolved it (I think?). This one seems to be some combination of shutdown logging & queue writing, which could also be two separate issues as well.
            Hide
            kwhetstone Kristin Whetstone added a comment -

            To me, the file showing up in the first place and that it's successfully read on startup means the file was created correctly (there's no missing information when being written and it's not formatted incorrectly) it might just be incomplete. Obviously I'm making some assumptions on the queue behavior, and spent some time trying to go through where the queue was written so figure out how it came into being. Apparently there are 2 types of shutdown: one that writes the queue and one that's "quick" not counting the error case, the secret not-so-fun shutdown. From what I understand the error case just completely tanks jetty, et al. If it's able to save state enough to pick back up on a restart, I think that's slightly better than not having anything at all.

            I'd check out if the queue being incomplete issue exists on the latest 1.x release. That will at least give some context into whether it's something new or not. Either way it's not good, but at least it's not worse!

            Show
            kwhetstone Kristin Whetstone added a comment - To me, the file showing up in the first place and that it's successfully read on startup means the file was created correctly (there's no missing information when being written and it's not formatted incorrectly) it might just be incomplete. Obviously I'm making some assumptions on the queue behavior, and spent some time trying to go through where the queue was written so figure out how it came into being. Apparently there are 2 types of shutdown: one that writes the queue and one that's "quick" not counting the error case, the secret not-so-fun shutdown. From what I understand the error case just completely tanks jetty, et al. If it's able to save state enough to pick back up on a restart, I think that's slightly better than not having anything at all. I'd check out if the queue being incomplete issue exists on the latest 1.x release. That will at least give some context into whether it's something new or not. Either way it's not good, but at least it's not worse!
            Hide
            svanoort Sam Van Oort added a comment -

            Oh! I'd tested on CJE 1.625.16.1 - the queue is maintained when killed by Ctrl+C. Just rechecked, and that is also true on 1.642.x - so it's clearly a regression since then.

            To be clear: we're not saving state to pick back up on restart at all. The queue.xml is empty of items, this is one sample:

            <?xml version='1.0' encoding='UTF-8'?>
            <hudson.model.Queue_-State>
              <counter>42</counter>
              <items/>
            </hudson.model.Queue_-State>
            
            Show
            svanoort Sam Van Oort added a comment - Oh! I'd tested on CJE 1.625.16.1 - the queue is maintained when killed by Ctrl+C. Just rechecked, and that is also true on 1.642.x - so it's clearly a regression since then. To be clear: we're not saving state to pick back up on restart at all. The queue.xml is empty of items, this is one sample: <?xml version= '1.0' encoding= 'UTF-8' ?> <hudson.model.Queue_-State> <counter>42</counter> <items/> </hudson.model.Queue_-State>
            Hide
            danielbeck Daniel Beck added a comment -

            Hmmm… could this be related to JENKINS-34029?

            Show
            danielbeck Daniel Beck added a comment - Hmmm… could this be related to JENKINS-34029 ?
            Hide
            kwhetstone Kristin Whetstone added a comment -

            Cool, so that's actually a regression from beta2 where that information was saved. I think that's a completely separate defect since this jetty issue happened before then. Good find during TestFest!

            Show
            kwhetstone Kristin Whetstone added a comment - Cool, so that's actually a regression from beta2 where that information was saved. I think that's a completely separate defect since this jetty issue happened before then. Good find during TestFest!
            teilo James Nord made changes -
            Assignee James Nord [ teilo ]
            Hide
            teilo James Nord added a comment -

            Not saving the queue on windows was fixed recently on master.
            I'm unassigning from myself for the moment as I'm not looking at this. If I get to look at it and it's not taken I will re-assign it.

            Show
            teilo James Nord added a comment - Not saving the queue on windows was fixed recently on master. I'm unassigning from myself for the moment as I'm not looking at this. If I get to look at it and it's not taken I will re-assign it.
            Hide
            svanoort Sam Van Oort added a comment -

            Closing this since Kristin opened https://issues.jenkins-ci.org/browse/JENKINS-34281 which looks to be a completely separate issue.

            Show
            svanoort Sam Van Oort added a comment - Closing this since Kristin opened https://issues.jenkins-ci.org/browse/JENKINS-34281 which looks to be a completely separate issue.
            Hide
            svanoort Sam Van Oort added a comment -

            Closing this, since Kristin has forked this out.

            Show
            svanoort Sam Van Oort added a comment - Closing this, since Kristin has forked this out.
            svanoort Sam Van Oort made changes -
            Link This issue is related to JENKINS-34281 [ JENKINS-34281 ]
            svanoort Sam Van Oort made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Cannot Reproduce [ 5 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 169937 ] JNJira + In-Review [ 198735 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                teilo James Nord
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: