Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53315

Timeout step should support a closure to execute prior to killing body

    Details

    • Similar Issues:

      Description

      Currently, the timeout step simply kills whatever processes were launched during execution of its body, and then throws an exception.  This makes it difficult to perform any automated debugging on these processes, since they are killed by the time the user finds out that they are hung (or slow).   It would be nice to be able to get some information about the state of affairs before things are killed, and maybe even perform safe shutdown steps prior to kill.

      Currently: 

      try {
        timeout(time: 1, unit: 'HOURS') {
           sh "java IntermittentlySlowProcess"
        }
      } catch (t) {
          //It's too late to, for example, send a "kill -3" to the slow/hung java process
      }
      

      What I'd propose

      (and I'm willing to try to make a PR if this seems reasonable):

      timeout(time: 1, unit: 'HOURS', beforeKill: {
         sh "killall -3 java" //for example
      }) {
         sh "java IntermittentlySlowProcess"
      }
      

      The new  "beforeKill" closure can be used for clean shutdown of complex tasks, analysis of problems, etc.

      One workaround may be to wrap whatever you are running and trap signals, but that's ugly and error-prone (and will likely cause zombies)

      Thoughts welcome.

       

       

        Attachments

          Activity

          Hide
          jglick Jesse Glick added a comment -

          And no you do not need a separate pipeline-timeout-prekill.sh if you are using /usr/bin/timeout. Look at my example again. That one-liner sends SIGQUIT after ten seconds, then waits one more second for the thread dump to appear, and sends a SIGTERM.

          Show
          jglick Jesse Glick added a comment - And no you do not need a separate pipeline-timeout-prekill.sh if you are using /usr/bin/timeout . Look at my example again. That one-liner sends SIGQUIT after ten seconds, then waits one more second for the thread dump to appear, and sends a SIGTERM .
          Hide
          akom Alexander Komarov added a comment -

          Thanks, I did see that, but a "kill -3" on the main gradle process isn't going to help me - I need to get the thread dump from the hung tests which are running a separate process launched by gradle.  My current approach is to run "jstack" on every java process on the slave (we have a single executor policy)

          I also just tried setting JENKINS_SERVER_COOKIE on the child process and that doesn't help with killing subprocesses, but I haven't had time to research further.  /usr/bin/timeout does work in combination with my trap script.

          Show
          akom Alexander Komarov added a comment - Thanks, I did see that, but a "kill -3" on the main gradle process isn't going to help me - I need to get the thread dump from the hung tests which are running a separate process launched by gradle.  My current approach is to run "jstack" on every java process on the slave (we have a single executor policy) I also just tried setting JENKINS_SERVER_COOKIE on the child process and that doesn't help with killing subprocesses, but I haven't had time to research further.  /usr/bin/timeout does work in combination with my trap script.
          Hide
          jglick Jesse Glick added a comment -

          get the thread dump from the hung tests which are running a separate process launched by gradle

          If that is all you wanted, you may be barking up the wrong tree. The JUnit Timeout rule, for example, applies a per-test-case timeout (which is likely to be more robust and easier to manage that a per-build timeout) and automatically displays a thread dump for hung tests.

          Show
          jglick Jesse Glick added a comment - get the thread dump from the hung tests which are running a separate process launched by gradle If that is all you wanted, you may be barking up the wrong tree. The JUnit Timeout rule , for example, applies a per-test-case timeout (which is likely to be more robust and easier to manage that a per-build timeout) and automatically displays a thread dump for hung tests.
          Hide
          akom Alexander Komarov added a comment -

          Sadly that's not all I'm after. In the case of complex integration tests, the tests may be hanging due to a hung external process (ie a server) started as part of the test. That is the process I need the stacktrace for, not the test. Otherwise, Timeout rule would work great.

          Show
          akom Alexander Komarov added a comment - Sadly that's not all I'm after. In the case of complex integration tests, the tests may be hanging due to a hung external process (ie a server) started as part of the test. That is the process I need the stacktrace for, not the test. Otherwise, Timeout rule would work great.
          Hide
          jglick Jesse Glick added a comment -

          In the case of JUnit, you would probably want to write a simple TestRule (if one does not already exist) which is responsible for launching the external process, sending it SIGQUIT if the test fails, and finally sending in SIGTERM in all cases.


          To your original proposal, if you really have to do this inside Pipeline script for some reason, try something like

          timeout(time: 1, unit: 'HOURS') {
            def ok = false
            try {
              sh 'java IntermittentlySlowProcess'
              ok = true
            } finally {
              if (!ok) {
                sh 'killall -3 java'
              }
            }
          }
          
          Show
          jglick Jesse Glick added a comment - In the case of JUnit, you would probably want to write a simple TestRule (if one does not already exist) which is responsible for launching the external process, sending it SIGQUIT if the test fails, and finally sending in SIGTERM in all cases. To your original proposal, if you really have to do this inside Pipeline script for some reason, try something like timeout(time: 1, unit: 'HOURS' ) { def ok = false try { sh 'java IntermittentlySlowProcess' ok = true } finally { if (!ok) { sh 'killall -3 java' } } }

            People

            • Assignee:
              Unassigned
              Reporter:
              akom Alexander Komarov
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: