Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51454

Pipeline retry operation doesn't retry when there is a timeout inside of it

    Details

    • Similar Issues:

      Description

      When a timeout call is fired inside of a retry, retry is not being triggered and job execution is aborted, the only way to make it work is by surrounding the timeout operation with a try/catch.

      without try/catch

      Log output

      Cancelling nested steps due to timeout
      

      Execution result

      Timeout has been exceeded
      Finished: ABORTED
      

      with try/catch

      Log output

      Timeout set to expire after 2 sec without activity
      Sleeping for 2 sec
      Cancelling nested steps due to timeout
      ERROR: catched timeout! org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
      Retrying
      

      Execution result

      Finished: SUCCESS
      

       

      Examples to reproduce the issue

      Failing example

      node {
          def timeoutSeconds = 3
          stage('Preparation') { // for display purposes
              retry(3){
                  timeout(activity: true, time: 2, unit: 'SECONDS') {
                      sleep(timeoutSeconds)
                  }
                  timeoutSeconds--
              }
         }
      }
      

      Working example

      node {
          def timeoutSeconds = 3
          stage('Preparation') { // for display purposes
              retry(3){
                  try{
                      timeout(activity: true, time: 2, unit: 'SECONDS') {
                          sleep(timeoutSeconds)
                      }
                  }catch(err){
                      timeoutSeconds--
                      script{
                        def user = err.getCauses()[0].getUser()
                        error "[${user}] catched timeout! $err"
                      }
                  }
              }
         }
      }
      

        Attachments

          Issue Links

            Activity

            Hide
            pixman20 pixman20 added a comment -

            FWIW: I did a slight modification to the initial workaround and added it to our global shared libraries as retryEvenOnTimeout.
             
            It's slightly more robust:

            def call(Map retryOptions, Closure closure) {
                retry(retryOptions) {
                    try {
                        closure()
                    } catch(org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){
                        def causes = ex.getCauses()
                        if(causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) {
                            throw ex
                        }
                        // Wrap the real exception and throw a RuntimeException that'll trigger the retry
                        throw new RuntimeException(ex)
                    }
                }
            }
            

            Other potential solution options are to merge retry and timeout functionality to be able to allow the user to have the option of which functionality they need.

            Show
            pixman20 pixman20 added a comment - FWIW: I did a slight modification to the initial workaround and added it to our global shared libraries as retryEvenOnTimeout.   It's slightly more robust: def call(Map retryOptions, Closure closure) { retry(retryOptions) { try { closure() } catch(org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){ def causes = ex.getCauses() if(causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) { throw ex } // Wrap the real exception and throw a RuntimeException that'll trigger the retry throw new RuntimeException(ex) } } } Other potential solution options are to merge retry and timeout functionality to be able to allow the user to have the option of which functionality they need.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            I think there are two desired behaviors depending on the placement of timeout and retry relative to each other.

            First case (this ticket):

            retry(3) {
              timeout(time: 5, unit: 'MINUTES') {
                // Something that can fail
              }
            }
            

            In this case, if the timeout triggers, I think the desired behavior is for retry to run its body again. This is not the current behavior.

            Second case (not this ticket):

            timeout(time: 5, unit: 'MINUTES') {
              retry(3) {
                // Something that can fail
              }
            }
            

            In this case, if the timeout triggers, I think the desired behavior is for retry to be aborted without retrying anything. This is the current behavior, and as far as I understand, is working as-designed after JENKINS-44379.

            Switching timeout to use a different kind of exception would fix the first case, but break the second case. To support both use cases, something more complex would be needed (see PR 81 for a possible approach, although retry would need to be updated as well)). Basil Crow Noted that a fix for JENKINS-60354 might overlap a bit with this issue.

            Show
            dnusbaum Devin Nusbaum added a comment - I think there are two desired behaviors depending on the placement of timeout and retry relative to each other. First case (this ticket): retry(3) { timeout(time: 5, unit: 'MINUTES' ) { // Something that can fail } } In this case, if the timeout triggers, I think the desired behavior is for retry to run its body again. This is not the current behavior. Second case (not this ticket): timeout(time: 5, unit: 'MINUTES' ) { retry(3) { // Something that can fail } } In this case, if the timeout triggers, I think the desired behavior is for retry to be aborted without retrying anything. This is the current behavior, and as far as I understand, is working as-designed after  JENKINS-44379 . Switching timeout to use a different kind of exception would fix the first case, but break the second case. To support both use cases, something more complex would be needed (see PR 81 for a possible approach, although retry would need to be updated as well)). Basil Crow Noted that a fix for  JENKINS-60354 might overlap a bit with this issue.
            Hide
            basil Basil Crow added a comment -

            As a workaround, I was able to catch FlowInterruptedException and rethrow a more generic exception:

            import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
            
            retry(3) {
              try {
                timeout(time: 10, unit: 'MINUTES') {
                  [..]
                }
              } catch (FlowInterruptedException e) {
               // Work around https://issues.jenkins-ci.org/browse/JENKINS-51454
                error 'Timeout has been exceeded'
              }
            }
            
            Show
            basil Basil Crow added a comment - As a workaround, I was able to catch FlowInterruptedException and rethrow a more generic exception: import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException retry(3) { try { timeout(time: 10, unit: 'MINUTES') { [..] } } catch (FlowInterruptedException e) { // Work around https://issues.jenkins-ci.org/browse/JENKINS-51454 error 'Timeout has been exceeded' } }
            Hide
            basil Basil Crow added a comment -

            I can reproduce this error in a scripted pipeline as well.

            Show
            basil Basil Crow added a comment - I can reproduce this error in a scripted pipeline as well.
            Hide
            luispiedra Luis Piedra-Márquez added a comment -

            If capturing the `FlowInterruptedException` it will retry also when aborting the job for any other reason, like canceling. Definitively, timeout should thow a different exception.

            Show
            luispiedra Luis Piedra-Márquez added a comment - If capturing the ` FlowInterruptedException` it will retry also when aborting the job for any other reason, like canceling. Definitively, timeout should thow a different exception.
            Hide
            nepomuk_seiler Nepomuk Seiler added a comment -

            It looks like the org.jenkinsci.plugins.workflow.steps.FlowInterruptedException exception could be the  cause. If the timeout directive would throw another exception, would this solve the issue?

            Show
            nepomuk_seiler Nepomuk Seiler added a comment - It looks like the org.jenkinsci.plugins.workflow.steps.FlowInterruptedException exception could be the  cause. If the timeout directive would throw another exception, would this solve the issue?

              People

              • Assignee:
                Unassigned
                Reporter:
                daconstenla David Constenla
              • Votes:
                11 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated: