Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53315

Timeout step should support a closure to execute prior to killing body

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Currently, the timeout step simply kills whatever processes were launched during execution of its body, and then throws an exception.  This makes it difficult to perform any automated debugging on these processes, since they are killed by the time the user finds out that they are hung (or slow).   It would be nice to be able to get some information about the state of affairs before things are killed, and maybe even perform safe shutdown steps prior to kill.

      Currently: 

      try {
        timeout(time: 1, unit: 'HOURS') {
           sh "java IntermittentlySlowProcess"
        }
      } catch (t) {
          //It's too late to, for example, send a "kill -3" to the slow/hung java process
      }
      

      What I'd propose

      (and I'm willing to try to make a PR if this seems reasonable):

      timeout(time: 1, unit: 'HOURS', beforeKill: {
         sh "killall -3 java" //for example
      }) {
         sh "java IntermittentlySlowProcess"
      }
      

      The new  "beforeKill" closure can be used for clean shutdown of complex tasks, analysis of problems, etc.

      One workaround may be to wrap whatever you are running and trap signals, but that's ugly and error-prone (and will likely cause zombies)

      Thoughts welcome.

       

       

        Attachments

          Activity

          Hide
          akom Alexander Komarov added a comment -

          Interestingly, when I use the linux "timeout" that you mention instead of the timeout step, traps work fine... So the linux solution is a combination of the shell script and timeout:

          timeout 10 bash pipeline-timeout-prekill.sh './gradlew ....'

          Which is a lot of parts...  but OK - now I'll just need to find an equivalent for windows (we do cross-platform testing on many platforms, that's why I wanted to solve this in pipeline code).

          Show
          akom Alexander Komarov added a comment - Interestingly, when I use the linux "timeout" that you mention instead of the timeout step, traps work fine... So the linux solution is a combination of the shell script and timeout: timeout 10 bash pipeline-timeout-prekill.sh  './gradlew ....' Which is a lot of parts...  but OK - now I'll just need to find an equivalent for windows (we do cross-platform testing on many platforms, that's why I wanted to solve this in pipeline code).
          Hide
          jglick Jesse Glick added a comment -

          As I said before, yes Jenkins might be sending SIGTERM to the whole process tree, not only the entry script. This might be worked around (untested) via

          JENKINS_SERVER_COOKIE=suppress java -jar …
          

          since it uses this environment variable to identify some processes. I have forgotten the details at this point. (JENKINS-28182)

          The “good solutions” are

          • Java shutdown hooks
          • using /usr/bin/timeout
          Show
          jglick Jesse Glick added a comment - As I said before, yes Jenkins might be sending SIGTERM to the whole process tree, not only the entry script. This might be worked around (untested) via JENKINS_SERVER_COOKIE=suppress java -jar … since it uses this environment variable to identify some processes. I have forgotten the details at this point. ( JENKINS-28182 ) The “good solutions” are Java shutdown hooks using /usr/bin/timeout
          Hide
          jglick Jesse Glick added a comment -

          And no you do not need a separate pipeline-timeout-prekill.sh if you are using /usr/bin/timeout. Look at my example again. That one-liner sends SIGQUIT after ten seconds, then waits one more second for the thread dump to appear, and sends a SIGTERM.

          Show
          jglick Jesse Glick added a comment - And no you do not need a separate pipeline-timeout-prekill.sh if you are using /usr/bin/timeout . Look at my example again. That one-liner sends SIGQUIT after ten seconds, then waits one more second for the thread dump to appear, and sends a SIGTERM .
          Hide
          akom Alexander Komarov added a comment -

          Thanks, I did see that, but a "kill -3" on the main gradle process isn't going to help me - I need to get the thread dump from the hung tests which are running a separate process launched by gradle.  My current approach is to run "jstack" on every java process on the slave (we have a single executor policy)

          I also just tried setting JENKINS_SERVER_COOKIE on the child process and that doesn't help with killing subprocesses, but I haven't had time to research further.  /usr/bin/timeout does work in combination with my trap script.

          Show
          akom Alexander Komarov added a comment - Thanks, I did see that, but a "kill -3" on the main gradle process isn't going to help me - I need to get the thread dump from the hung tests which are running a separate process launched by gradle.  My current approach is to run "jstack" on every java process on the slave (we have a single executor policy) I also just tried setting JENKINS_SERVER_COOKIE on the child process and that doesn't help with killing subprocesses, but I haven't had time to research further.  /usr/bin/timeout does work in combination with my trap script.
          Hide
          jglick Jesse Glick added a comment -

          get the thread dump from the hung tests which are running a separate process launched by gradle

          If that is all you wanted, you may be barking up the wrong tree. The JUnit Timeout rule, for example, applies a per-test-case timeout (which is likely to be more robust and easier to manage that a per-build timeout) and automatically displays a thread dump for hung tests.

          Show
          jglick Jesse Glick added a comment - get the thread dump from the hung tests which are running a separate process launched by gradle If that is all you wanted, you may be barking up the wrong tree. The JUnit Timeout rule , for example, applies a per-test-case timeout (which is likely to be more robust and easier to manage that a per-build timeout) and automatically displays a thread dump for hung tests.

            People

            • Assignee:
              Unassigned
              Reporter:
              akom Alexander Komarov
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: