Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44993

Post-Step should have "try-finally" semantics

    Details

    • Epic Link:
    • Similar Issues:

      Description

      We are using a declarative Jenkins pipeline in our project that needs to react to build failures (e.g. send Slack notifications...). Basically it looks like this:

      pipeline {
         stages {
            stage("one"){
              steps {
                sh "./stepOne"
                sh "./stepTwo"
                sh "./stepThree"
              }
              post {
                success {
                  // ... stage one success handler...
                }
                failure {
                  // ... stage one failure handler...
                }
              }
            }
            // more stages here...
         }
         post {
           success {
             // ... build success handler...
           }
           failure {
             // ... build failure handler...
           }
         }
      }

      Now, let's assume that the command sh "./stepTwo" fails to execute for some reason. It can return with exit code -1, the shell might crash, or any other abnormal condition might occur. Here is what I would assume to happen:

      1) sh "./stepThree" (and the remainder of stage "one") is skipped
      2) stage one failure handler is executed
      3) build failure handler is executed
      4) build is terminated and marked as failing, with stage "one" as the "root cause".

      Here is actually happens:

      1) Jenkins detects that sh "./stepTwo" has terminated abnormally.
      2) The entire build is marked as failure and is terminated immediately. No post handlers are executed.

      The issue here is that I would expect post handlers to always be executed, even when the build fails at some stage. They need to have "try-finally" semantics in my opinion, otherwise a post failure clause is not very useful.

      My current workaround is to use catchError clauses, but those are suboptimal as they require me to mark the build as "failure" myself. Futhermore, they are just noise in this case, as I only want to execute my post handlers before the build is terminated.

        Attachments

          Issue Links

            Activity

            Hide
            abayer Andrew Bayer added a comment -

            That's really strange - I'm fairly sure that's not what happens for a standard failure (i.e., sh exiting with -1, etc), or tests like those for https://github.com/jenkinsci/pipeline-model-definition-plugin/blob/master/pipeline-model-definition/src/test/resources/postOnChangeFailed.groovy wouldn't be passing. Can you give a more concrete example that I can write a test case with?

            Show
            abayer Andrew Bayer added a comment - That's really strange - I'm fairly sure that's not what happens for a standard failure (i.e., sh exiting with -1, etc), or tests like those for https://github.com/jenkinsci/pipeline-model-definition-plugin/blob/master/pipeline-model-definition/src/test/resources/postOnChangeFailed.groovy wouldn't be passing. Can you give a more concrete example that I can write a test case with?
            Hide
            martin_haeusler Martin Hausler added a comment - - edited

            I can't tell you exactly what happened, but our script was something like ./gradlew build, and when that script execution failed (e.g. gradlew called javac, and javac produced a compile error, or gradlew executes JUnit tests, which fail), then the post commands were skipped. I haven't been able to figure out exactly how ./gradlew exits the process in this case, but it happened in a way that skipped the post commands.

            Show
            martin_haeusler Martin Hausler added a comment - - edited I can't tell you exactly what happened, but our script was something like ./gradlew build , and when that script execution failed (e.g. gradlew called javac , and javac produced a compile error, or gradlew executes JUnit tests, which fail), then the post commands were skipped. I haven't been able to figure out exactly how ./gradlew exits the process in this case, but it happened in a way that skipped the post commands.
            Hide
            abayer Andrew Bayer added a comment -

            Martin Hausler - let me know if you can get a reproduction. Once you do, I am confident I can fix it. =)

            Show
            abayer Andrew Bayer added a comment - Martin Hausler - let me know if you can get a reproduction. Once you do, I am confident I can fix it. =)
            Hide
            rpocase Robby Pocase added a comment -

            I think I'm hitting some variant of this. One minimal reproduction (in 1.1.6) is:

            pipeline {
                agent any
                options { timeout(time: 5, unit: 'SECONDS') }
                stages {
                    stage('sleep') {
                        steps {
                            sleep(6)
                        }
                    }
                    
                }
                post {
                    always  { deleteDir() }
                    failure { echo 'Will not execute' }
                }
            }
            

            This may be intentional since the state actually ends up being "aborted". In production, I have some timeouts that actually mark the build as failure, but that might be other logic interacting in unexpected ways.

            I've had other cases where a stage actually exits abnormally, but struggling to get a minimal reproduction. The rough execution flow for the production pipeline is below. I'll update if I can get something that forces the condition.

            pipeline {
                agent any
                stages {
                    stage('sleep') {
                        steps {
                            script {
                                try {
                                    sh 'exit 1'
                                } catch(err) {
                                    error "Failed while doing something"
                                }
                            }
                        }
                    }
                    
                }
                post {
                    always  { 
                        deleteDir() 
                    }
                    failure { echo 'Is running' }
                }
            }
            

            Would a full step execution trace (all grey text in console output) help? Is there a nice way to produce that?

            Show
            rpocase Robby Pocase added a comment - I think I'm hitting some variant of this. One minimal reproduction (in 1.1.6) is: pipeline { agent any options { timeout(time: 5, unit: 'SECONDS' ) } stages { stage( 'sleep' ) { steps { sleep(6) } } } post { always { deleteDir() } failure { echo 'Will not execute' } } } This may be intentional since the state actually ends up being "aborted". In production, I have some timeouts that actually mark the build as failure, but that might be other logic interacting in unexpected ways. I've had other cases where a stage actually exits abnormally, but struggling to get a minimal reproduction. The rough execution flow for the production pipeline is below. I'll update if I can get something that forces the condition. pipeline { agent any stages { stage( 'sleep' ) { steps { script { try { sh 'exit 1' } catch (err) { error "Failed while doing something" } } } } } post { always { deleteDir() } failure { echo 'Is running' } } } Would a full step execution trace (all grey text in console output) help? Is there a nice way to produce that?
            Hide
            rpocase Robby Pocase added a comment -

            I believe I found my root issue. In the below pipeline, my failure condition does something with an undefined variable as the very first step in the failure block. Instead of complaining about the undefined variable, it just silently doesn't do the rest of the post block. I discovered this by moving my failure items up into post { always {} }, which did complain about the undefined variable.

            pipeline {
                agent any
                stages {
                    stage('error') {
                        steps {
                            error "Intentional failure"
                        }
                    }
                    
                }
                post {
                    always  { 
                        echo "Always"
                    }
                    failure { 
                        script {
                            sh "reference to some ${undefined} var"
                        }
                    }
                }
            }
            

            Martin Hausler Does this possibly look like your issue as well?

            Show
            rpocase Robby Pocase added a comment - I believe I found my root issue. In the below pipeline, my failure condition does something with an undefined variable as the very first step in the failure block. Instead of complaining about the undefined variable, it just silently doesn't do the rest of the post block. I discovered this by moving my failure items up into post { always {} }, which did complain about the undefined variable. pipeline { agent any stages { stage( 'error' ) { steps { error "Intentional failure" } } } post { always { echo "Always" } failure { script { sh "reference to some ${undefined} var " } } } } Martin Hausler Does this possibly look like your issue as well?
            Hide
            martin_haeusler Martin Hausler added a comment -

            Yes I see a lot of resemblance to my case here. I think this might be the same root cause, especially the "it just silently doesn't do the rest of the post block" part. That sounds very familiar to me.

            Show
            martin_haeusler Martin Hausler added a comment - Yes I see a lot of resemblance to my case here. I think this might be the same root cause, especially the "it just silently doesn't do the rest of the post block" part. That sounds very familiar to me.
            Hide
            abayer Andrew Bayer added a comment -

            Got it! The problem here is that we only end up throwing one exception per stage - because once we throw an exception, we're out of the stage. So if your stage has already failed before you get to post, any exception thrown by a post block never gets thrown further, because we propagate the initial exception instead. This means that the post block exception never gets logged, etc.

            I don't love this fix, but https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/259 - with this, we'll always log when we get an exception from a post block, with full stacktrace. I'm open to changing the exact details of what we log there, but we do still want to make sure that we're throwing the first exception encountered, so that visualization can tell where the build "failed" per se. This can lead to duplicate logging if the first error of the stage is from a post block, though, which I'm not happy about.

            Show
            abayer Andrew Bayer added a comment - Got it! The problem here is that we only end up throwing one exception per stage - because once we throw an exception, we're out of the stage. So if your stage has already failed before you get to post , any exception thrown by a post block never gets thrown further, because we propagate the initial exception instead. This means that the post block exception never gets logged, etc. I don't love this fix, but https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/259 - with this, we'll always log when we get an exception from a post block, with full stacktrace. I'm open to changing the exact details of what we log there, but we do still want to make sure that we're throwing the first exception encountered, so that visualization can tell where the build "failed" per se. This can lead to duplicate logging if the first error of the stage is from a post block, though, which I'm not happy about.
            Hide
            abayer Andrew Bayer added a comment -

            Merged, releasing in 1.3 today.

            Show
            abayer Andrew Bayer added a comment - Merged, releasing in 1.3 today.

              People

              • Assignee:
                abayer Andrew Bayer
                Reporter:
                martin_haeusler Martin Hausler
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: