Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28822

Can't distinguish between durable task abort and failure in workflow plugin

    Details

    • Similar Issues:
    • Released As:
      workflow-durable-task-step 2.24

      Description

      When a workflow is manually aborted in Jenkins it fires a hudson.AbortException. This is the same thing that happens when a step fails. Thus, it's impossible to properly set the build status to ABORTED on a manual abort and to FAILED on a failed step because you can't programmatically tell the difference.

      Jesse Glick’s recommended fix: see comment as of 2017-06-28

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Manually stopping a build should set the result to ABORTED. This uses FlowInterruptedException, not AbortException.

            Show
            jglick Jesse Glick added a comment - Manually stopping a build should set the result to ABORTED . This uses FlowInterruptedException , not AbortException .
            Hide
            jmif Joe Mifsud added a comment - - edited

            What do you mean by manually stopping?

            If you abort the build by clicking "Abort" when paused for input, a FlowInterrupedException is fired. However, if you click the Jenkins abort button, an abort exception with no cause is fired.

            
            Started by user Joe Mifsud
            Running: Build Stage
            Entering stage Build Stage
            Proceeding
            Running: Allocate node : Start
            Running on master in /var/lib/jenkins/jobs/REDACTED/workspace
            Running: Allocate node : Body : Start
            Running: Git
             > git rev-parse --is-inside-work-tree # timeout=10
            Fetching changes from the remote Git repository
             > git config remote.origin.url git@REDACTED
            Fetching upstream changes from git@REDACTED
             > git --version # timeout=10
            using GIT_SSH to set credentials Credentials for git - created by Chef
             > git -c core.askpass=true fetch --tags --progress REDACTED +refs/heads/*:refs/remotes/origin/*
             > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
             > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
            Checking out Revision REDACTED (refs/remotes/origin/master)
             > git config core.sparsecheckout # timeout=10
             > git checkout -f REDACTED
             > git rev-list REDACTED # timeout=10
            Running: Archive Artifacts
            Running: Allocate node : Body : End
            Running: Allocate node : End
            Running: Allocate node : Start
            Running on master in /var/lib/jenkins/jobs/REDACTED/workspace
            Running: Allocate node : Body : Start
            Aborted by Joe Mifsud
            Running: Write file to workspace
            Running: Shell Script
            [workspace] Running shell script
            + flock -w 600 /var/lock/sp_js_build_npm_install.lock -c npm install
            Terminated
            Running: Allocate node : Body : End
            Running: Allocate node : End
            Running: Print Message
            Received hudson.AbortException with message script returned exit code 143      <--- e.getClass().name
            Running: Print Message
            Received org.codehaus.groovy.runtime.NullObject with message script returned exit code 143      <--- e.getCause().getClass().name
            Running: Print Message
            Build failed due to class hudson.AbortException script returned exit code 143
            
            Show
            jmif Joe Mifsud added a comment - - edited What do you mean by manually stopping? If you abort the build by clicking "Abort" when paused for input, a FlowInterrupedException is fired. However, if you click the Jenkins abort button, an abort exception with no cause is fired. Started by user Joe Mifsud Running: Build Stage Entering stage Build Stage Proceeding Running: Allocate node : Start Running on master in / var /lib/jenkins/jobs/REDACTED/workspace Running: Allocate node : Body : Start Running: Git > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url git@REDACTED Fetching upstream changes from git@REDACTED > git --version # timeout=10 using GIT_SSH to set credentials Credentials for git - created by Chef > git -c core.askpass= true fetch --tags --progress REDACTED +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision REDACTED (refs/remotes/origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f REDACTED > git rev-list REDACTED # timeout=10 Running: Archive Artifacts Running: Allocate node : Body : End Running: Allocate node : End Running: Allocate node : Start Running on master in / var /lib/jenkins/jobs/REDACTED/workspace Running: Allocate node : Body : Start Aborted by Joe Mifsud Running: Write file to workspace Running: Shell Script [workspace] Running shell script + flock -w 600 / var /lock/sp_js_build_npm_install.lock -c npm install Terminated Running: Allocate node : Body : End Running: Allocate node : End Running: Print Message Received hudson.AbortException with message script returned exit code 143 <--- e.getClass().name Running: Print Message Received org.codehaus.groovy.runtime.NullObject with message script returned exit code 143 <--- e.getCause().getClass().name Running: Print Message Build failed due to class hudson.AbortException script returned exit code 143
            Hide
            jmif Joe Mifsud added a comment -

            Reopening to ensure that our definition of "manual abort" is the same. Was just able to reproduce by clicking the jenkins abort button (red x).

            Show
            jmif Joe Mifsud added a comment - Reopening to ensure that our definition of "manual abort" is the same. Was just able to reproduce by clicking the jenkins abort button (red x).
            Hide
            jmif Joe Mifsud added a comment -

            An AbortException is also fired when the job is paused for input and the Jenkins abort button is clicked.

            Show
            jmif Joe Mifsud added a comment - An AbortException is also fired when the job is paused for input and the Jenkins abort button is clicked.
            Hide
            jglick Jesse Glick added a comment -

            If you abort inside a sh step, Jenkins sends SIGTERM to the process. That will normally cause it to exit with code 143 (128 + the SIGTERM signal code IIRC), which is treated as a failed script by sh, thus throwing an error (and causing the build to fail, if you do not catch it). The process could however be trapping the signal and behaving differently.

            If the process does not respond to SIGTERM, pressing the abort button a second time forcibly aborts the build (FlowInterruptedException).

            Canceling an input step uses FlowInterruptedException and sets the build status to aborted. This is unrelated to handling of durable tasks.

            Show
            jglick Jesse Glick added a comment - If you abort inside a sh step, Jenkins sends SIGTERM to the process. That will normally cause it to exit with code 143 (128 + the SIGTERM signal code IIRC), which is treated as a failed script by sh , thus throwing an error (and causing the build to fail, if you do not catch it). The process could however be trapping the signal and behaving differently. If the process does not respond to SIGTERM , pressing the abort button a second time forcibly aborts the build ( FlowInterruptedException ). Canceling an input step uses FlowInterruptedException and sets the build status to aborted. This is unrelated to handling of durable tasks.
            Hide
            kaihowl Kai Howelmeyer added a comment -

            I am also confused by this handling:

            In https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L319 we send AbortException. If I surround a "sh" step with try/catch, I cannot tell if Abort was clicked or if the shell script failed. Is there an easy way to distinguish this?

            Show
            kaihowl Kai Howelmeyer added a comment - I am also confused by this handling: In https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L319  we send AbortException. If I surround a "sh" step with try/catch, I cannot tell if Abort was clicked or if the shell script failed. Is there an easy way to distinguish this?
            Hide
            jglick Jesse Glick added a comment -

            Not currently. Read my last comment.

            Show
            jglick Jesse Glick added a comment - Not currently. Read my last comment.
            Hide
            adelcast Alejandro del Castillo added a comment -

            On Windows (bat step) is even harder to determine the difference between:

            1) Aborted build by a user

            2) Aborted build by a timeout

            Both return a hudson.AbortException, with an empty cause. An in both cases the exception says: "hudson.AbortException: script returned exit code -1". I would have expected a non-NULL cause in the case of a user aborted AbortException....or am I missing an easier way to determine the difference?

             

             

            Show
            adelcast Alejandro del Castillo added a comment - On Windows (bat step) is even harder to determine the difference between: 1) Aborted build by a user 2) Aborted build by a timeout Both return a hudson.AbortException, with an empty cause. An in both cases the exception says: "hudson.AbortException: script returned exit code -1".  I would have expected a non-NULL cause in the case of a user aborted AbortException....or am I missing an easier way to determine the difference?    
            Hide
            kaihowl Kai Howelmeyer added a comment -

            What I did in the meantime is the following "hackish" solution:

            class DoneException extends Exception
            
            boolean wasAborted = false
            
            try {
              parallel {
                "main": {
                   sh "myCommand"  
                   throw DoneException
                 },
                 "abortMonitor": {
                   try {
                     sleep(10, java.util.concurrent.TimeUnit.YEARS)
                   } catch (AbortException) {
                     wasAborted = true
                   }
                 },
                 failFast: true
              }
            } catch (DoneException e) {
              // Do nothing. This is expected.
            } catch (Exception e) {
              if (wasAborted) {
                throw e // Effectively do nothing
              } else {
                // Handle your failure of the sh command here
              }
            }

            This is the rough, untested sketch of how I worked around this limitation. There are other, less convoluted approaches if you are willing to grab into the internals via means of getContext. I decided to stick with the public API instead to spare myself some headaches.

            Why this works (in theory):

            The main insight for me was that the sleep command actually tells you if it was aborted (throws an AbortException) versus if it failed because of other reasons (throws a FlowInterruptedException). The "other reasons" are in this case a failure in a parallel branch as failFast is specified. This means that aborting a build will abort both the sh and the sleep step as they are outer leaves of the execution graph. If instead the sh step fails, it will cause the sleep inside the abortMonitor branch to fail as well. Why sleep was cancelled can be determined easily then by the type of exception that is thrown.

            There is one caveat to this approach:

            The "eternal" sleep has to be cancelled once the "main" branch was successful. This is done by throwing an exception of the custom type DoneException. This is abusing exceptions for non-exceptional control flow. Apart from being a bad pattern in general, it has the concrete downside of making the "bubble" for the sleep step in the workflow steps overview of the build appear red. I.e., you will have failing sleep steps in a successful build.

            Show
            kaihowl Kai Howelmeyer added a comment - What I did in the meantime is the following "hackish" solution: class DoneException extends Exception boolean wasAborted = false try { parallel { "main" : { sh "myCommand" throw DoneException }, "abortMonitor" : { try { sleep(10, java.util.concurrent.TimeUnit.YEARS) } catch (AbortException) { wasAborted = true } }, failFast: true } } catch (DoneException e) { // Do nothing. This is expected. } catch (Exception e) { if (wasAborted) { throw e // Effectively do nothing } else { // Handle your failure of the sh command here } } This is the rough, untested sketch of how I worked around this limitation. There are other, less convoluted approaches if you are willing to grab into the internals via means of getContext . I decided to stick with the public API instead to spare myself some headaches. Why this works (in theory): The main insight for me was that the sleep command actually tells you if it was aborted (throws an AbortException ) versus if it failed because of other reasons (throws a FlowInterruptedException ). The "other reasons" are in this case a failure in a parallel branch as failFast is specified. This means that aborting a build will abort both the sh and the sleep step as they are outer leaves of the execution graph. If instead the sh step fails, it will cause the sleep inside the  abortMonitor branch to fail as well. Why sleep was cancelled can be determined easily then by the type of exception that is thrown. There is one caveat to this approach: The "eternal" sleep has to be cancelled once the "main" branch was successful. This is done by throwing an exception of the custom type DoneException . This is abusing exceptions for non-exceptional control flow. Apart from being a bad pattern in general, it has the concrete downside of making the "bubble" for the sleep  step in the workflow steps overview of the build appear red. I.e., you will have failing sleep  steps in a successful build.
            Hide
            lukaswoj Łukasz Wojciechowski added a comment -

            Lately, on a project I work with, we replaced try/catch/catch/... blocks that was checking exception "class" type and return code to guess if the build was aborted by Jenkins user.

            Such approach has its flaws as you can not be sure about consistency of exit code if your pipeline depends on many `sh` step calls, unless those are all your processes (you control them) which I believe is not a very often case.

            Joe Mifsud - I believe you can get information you are after from api/json?depth=2 endpoint.

            In order to improve how we do it, I recently switched to approach based on asking jenkins API for information and parsing it.

            Here is a snippet that might become useful. It is not state of the art groovy programming but it works

            I also left the part that shows how to get failed stage name from wfapi/describe API endpoint.

            Hopefully someone will find my post useful.

            def fetchBuildDetails() {
                url1 = env.BUILD_URL + 'wfapi/describe'
                url2 = env.BUILD_URL + 'api/json?depth=2'
                jsonResponse = sh(script: 'echo "{\\"wfapi\\":`curl --silent --netrc-file /home/ubuntu/.curl-netrc '+url1+'`,\\"api\\":`curl --silent --netrc-file /home/ubuntu/.curl-netrc '+url2+'`}"', returnStdout: true).trim()
                echo "Fetched build details:"
                echo prettyPrint(jsonResponse)
                return new JsonSlurperClassic().parseText(jsonResponse)
            }
             
            
            def getBuildDetails() {
                def buildDetails = fetchBuildDetails()
                buildDetails.shouldNotificationGetSentToInfrastructureTeam = false
                buildDetails.abortedByUser = false
            
                // checking if user aborted the build
                for(int i=0; i < buildDetails.api.actions.size(); i++) {
                    if (buildDetails.api.actions[i]._class == 'jenkins.model.InterruptedBuildAction') {
                        if (buildDetails.api.actions[i].causes[0]._class == 'jenkins.model.CauseOfInterruption$UserInterruption') {
                            buildDetails.abortedByUser = true
                        }
                    }
                }
            
                // checking name of failed stage and deciding if in-team should get mentioned in slack message
                for(int i=0; i < buildDetails.wfapi.stages.size(); i++) {
                    if (buildDetails.wfapi.stages[i].status == 'FAILED') {
                        def failedStageName = buildDetails.wfapi.stages[i].name.toLowerCase()
                        if (
                            failedStageName.indexOf("checkout") >= 0
                            || failedStageName.indexOf("build") >= 0
                            || failedStageName.indexOf("deploy") >= 0
                        ) {
                            buildDetails.shouldNotificationGetSentToInfrastructureTeam = true
                        }
                    }
                }
            
                echo "Build details enchanced:"
                println prettyPrint(toJson(buildDetails))
            
                return buildDetails
            }
            
            Show
            lukaswoj Łukasz Wojciechowski added a comment - Lately, on a project I work with, we replaced try/catch/catch/... blocks that was checking exception "class" type and return code to guess if the build was aborted by Jenkins user. Such approach has its flaws as you can not be sure about consistency of exit code if your pipeline depends on many `sh` step calls, unless those are all your processes (you control them) which I believe is not a very often case. Joe Mifsud - I believe you can get information you are after from api/json?depth=2 endpoint. In order to improve how we do it, I recently switched to approach based on asking jenkins API for information and parsing it. Here is a snippet that might become useful. It is not state of the art groovy programming but it works I also left the part that shows how to get failed stage name from wfapi/describe  API endpoint. Hopefully someone will find my post useful. def fetchBuildDetails() { url1 = env.BUILD_URL + 'wfapi/describe' url2 = env.BUILD_URL + 'api/json?depth=2' jsonResponse = sh(script: 'echo "{\\" wfapi\\ ":`curl --silent --netrc-file /home/ubuntu/.curl-netrc ' +url1+ '`,\\" api\\ ":`curl --silent --netrc-file /home/ubuntu/.curl-netrc ' +url2+ '`}" ' , returnStdout: true ).trim() echo "Fetched build details:" echo prettyPrint(jsonResponse) return new JsonSlurperClassic().parseText(jsonResponse) }   def getBuildDetails() { def buildDetails = fetchBuildDetails() buildDetails.shouldNotificationGetSentToInfrastructureTeam = false buildDetails.abortedByUser = false // checking if user aborted the build for ( int i=0; i < buildDetails.api.actions.size(); i++) { if (buildDetails.api.actions[i]._class == 'jenkins.model.InterruptedBuildAction' ) { if (buildDetails.api.actions[i].causes[0]._class == 'jenkins.model.CauseOfInterruption$UserInterruption' ) { buildDetails.abortedByUser = true } } } // checking name of failed stage and deciding if in-team should get mentioned in slack message for ( int i=0; i < buildDetails.wfapi.stages.size(); i++) { if (buildDetails.wfapi.stages[i].status == 'FAILED' ) { def failedStageName = buildDetails.wfapi.stages[i].name.toLowerCase() if ( failedStageName.indexOf( "checkout" ) >= 0 || failedStageName.indexOf( "build" ) >= 0 || failedStageName.indexOf( "deploy" ) >= 0 ) { buildDetails.shouldNotificationGetSentToInfrastructureTeam = true } } } echo "Build details enchanced:" println prettyPrint(toJson(buildDetails)) return buildDetails }
            Hide
            adelcast Alejandro del Castillo added a comment -

            Thanks!!!! this indeed workarounds the current limitation. I'll try your code in our pipeline

            Show
            adelcast Alejandro del Castillo added a comment - Thanks!!!! this indeed workarounds the current limitation. I'll try your code in our pipeline
            Hide
            jglick Jesse Glick added a comment -

            Probably if DurableTaskStep.Execution.stop is called, the exception should be saved, and if the process subsequently exits with a nonzero code (particularly 143, in the case of sh, though perhaps it does not matter), the status code should be printed and the original exception should be used as the failure rather than a fresh AbortException.

            Not sure about returnStatus; would be a bigger behavioral change to make the step fail with the original FlowInterruptedException rather than returning 143 or whatever, but this is probably what a user would expect to happen.

            Show
            jglick Jesse Glick added a comment - Probably if DurableTaskStep.Execution.stop is called, the exception should be saved, and if the process subsequently exits with a nonzero code (particularly 143, in the case of sh , though perhaps it does not matter), the status code should be printed and the original exception should be used as the failure rather than a fresh AbortException . Not sure about returnStatus ; would be a bigger behavioral change to make the step fail with the original FlowInterruptedException rather than returning 143 or whatever, but this is probably what a user would expect to happen.
            Hide
            mgu Martins Gulbis added a comment -

            I stumbled upon this issue while working on scripted pipeline jobs. My problem was how to reliably identify that pipeline job was aborted by user from job it self for correct email template selection. My workaround was to look for InterruptedBuildAction within running builds actions if it's present then build was aborted. I'm not part of Jenkins development team so I'm probably wrong but my guess is that internally Jenkins uses similar concept for aborted build identification.

            Scripted pipeline code example to be placed in exception handling code

            if (manager.build.getAction(InterruptedBuildAction.class) ){
              currentBuild.result = 'ABORTED'
            }
            

            Logical chain: hudson.model.AbstractBuild -> (hudson.model.Actionable) getAction if something is returned then build was aborted. If needed you can look more deeply in getAction result vai getCauses() method and afterwards vai getShortDescription() you even can get userid who aborted the build.

            Show
            mgu Martins Gulbis added a comment - I stumbled upon this issue while working on scripted pipeline jobs. My problem was how to reliably identify that pipeline job was aborted by user from job it self for correct email template selection. My workaround was to look for InterruptedBuildAction within running builds actions if it's present then build was aborted. I'm not part of Jenkins development team so I'm probably wrong but my guess is that internally Jenkins uses similar concept for aborted build identification. Scripted pipeline code example to be placed in exception handling code if (manager.build.getAction(InterruptedBuildAction.class) ){ currentBuild.result = 'ABORTED' } Logical chain: hudson.model.AbstractBuild -> (hudson.model.Actionable) getAction if something is returned then build was aborted. If needed you can look more deeply in getAction result vai getCauses() method and afterwards vai getShortDescription() you even can get userid who aborted the build.
            Hide
            kutzi kutzi added a comment - - edited

            I guess this is also the reason for JENKINS-43339, isn't it?

            This is IMO a big issue as it makes the post { aborted {} } section in declarative pipelines largely unusable. In fact, I've not seen a single occasion when aborted worked as expected - when manually aborting a build. Then the failure section is triggered instead.

            Show
            kutzi kutzi added a comment - - edited I guess this is also the reason for JENKINS-43339 , isn't it? This is IMO a big issue as it makes the post { aborted {} } section in declarative pipelines largely unusable. In fact, I've not seen a single occasion when aborted worked as expected - when manually aborting a build. Then the failure section is triggered instead.
            Hide
            crussell52 Chris Russell added a comment - - edited

            Piecing together clues from comments here and in similar issues (such as JENKINS-41604), I've added this to our shared lib with pretty good results, so far.

            This version has some debug output and logic which isn't needed for our purposes (such as distinguishing between user and system rejections for inputs), but I left that logic in for the purposes of this comment in case it is useful for others.

            Pipeline

            try {
              // work
            } catch(err) {
              (new ExecutionHelper()).fixBuildResult(error)
              
              // Don't hide the error from jenkins
              throw err
            } finally {
              // React to currentBuild.result
            }

            ExcecutionHelper.groovy

            import jenkins.model.CauseOfInterruption
            import jenkins.model.InterruptedBuildAction
            import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
            import org.jenkinsci.plugins.workflow.support.steps.input.Rejection
            
            
            def fixBuildResult(error) {
            
            
                def iba = currentBuild.rawBuild.getAction(InterruptedBuildAction.class)
                if (iba && iba.causes.size() && iba.causes.any{ it instanceof CauseOfInterruption.UserInterruption }) {
                    // probably aborted the execution from outside pipeline
                    echo "run cancelled"
                    currentBuild.result = "ABORTED"
                } else if (error instanceof FlowInterruptedException && error.getCauses().any { it instanceof Rejection }) {
            
                    // Dig out the rejection and see if it is associated to a user.
                    def user = null
                    error.getCauses().each {
                        if (it instanceof Rejection) {
                            user = it.user
                        }
                    }
            
                    // System user probably means rejected because of timeout
                    echo (user && "${user}" != "SYSTEM" ? "rejected by user: ${user}" : "rejected by... not user")
            
                    currentBuild.result = "ABORTED"
                } else {
                    currentBuild.result = "FAILURE"
                }
            } 
            Show
            crussell52 Chris Russell added a comment - - edited Piecing together clues from comments here and in similar issues (such as JENKINS-41604 ), I've added this to our shared lib with pretty good results, so far. This version has some debug output and logic which isn't needed for our purposes (such as distinguishing between user and system rejections for inputs), but I left that logic in for the purposes of this comment in case it is useful for others. Pipeline try { // work } catch (err) { ( new ExecutionHelper()).fixBuildResult(error) // Don't hide the error from jenkins throw err } finally { // React to currentBuild.result } ExcecutionHelper.groovy import jenkins.model.CauseOfInterruption import jenkins.model.InterruptedBuildAction import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException import org.jenkinsci.plugins.workflow.support.steps.input.Rejection def fixBuildResult(error) { def iba = currentBuild.rawBuild.getAction(InterruptedBuildAction.class) if (iba && iba.causes.size() && iba.causes.any{ it instanceof CauseOfInterruption.UserInterruption }) { // probably aborted the execution from outside pipeline echo "run cancelled" currentBuild.result = "ABORTED" } else if (error instanceof FlowInterruptedException && error.getCauses().any { it instanceof Rejection }) { // Dig out the rejection and see if it is associated to a user. def user = null error.getCauses().each { if (it instanceof Rejection) { user = it.user } } // System user probably means rejected because of timeout echo (user && "${user}" != "SYSTEM" ? "rejected by user: ${user}" : "rejected by... not user" ) currentBuild.result = "ABORTED" } else { currentBuild.result = "FAILURE" } }
            Hide
            macdrega Joerg Schwaerzler added a comment - - edited

            I wonder whether my issue is related to this one:

            I get a hudson.AbortException() sometimes if some parallel steps are waiting for some free node when a timeout aborts the build. I would expect to get a FlowInterruptedException, though.

            I get Encountered exception: hudson.AbortException: Queue task was cancelled.

            Show
            macdrega Joerg Schwaerzler added a comment - - edited I wonder whether my issue is related to this one: I get a hudson.AbortException() sometimes if some parallel steps are waiting for some free node when a timeout aborts the build. I would expect to get a FlowInterruptedException , though. I get Encountered exception: hudson.AbortException: Queue task was cancelled.
            Hide
            jglick Jesse Glick added a comment -

            Joerg Schwaerzler that is indeed related, but should have been fixed by this in 2.21 AFAIK.

            Show
            jglick Jesse Glick added a comment - Joerg Schwaerzler that is indeed related, but should have been fixed by this in 2.21 AFAIK.
            Hide
            macdrega Joerg Schwaerzler added a comment -

            Thanks for the info. After updating to 2.21 I can confirm that the issue we faced is fixed.

            Show
            macdrega Joerg Schwaerzler added a comment - Thanks for the info. After updating to 2.21 I can confirm that the issue we faced is fixed.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            I am working on this in https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/75. My main concern is whether fixing it now will break all of the workarounds the people are currently using, so I want to do a few tests with some of the workarounds posted here to get an idea of the impact.

            Show
            dnusbaum Devin Nusbaum added a comment - I am working on this in https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/75 . My main concern is whether fixing it now will break all of the workarounds the people are currently using, so I want to do a few tests with some of the workarounds posted here to get an idea of the impact.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            There are 2 main differences that my change would cause:

            1. If a sh step is manually aborted, the exception thrown will be a FlowInterruptedException rather than an AbortException (and FlowInterruptedException will be thrown even if returnStatus is true). The script will exit with RESULT.ABORTED both before and after my change. The behavior of Chris Russell's workaround is unaffected by my change in this case.
            2. If a sh step is automatically aborted by a timeout step, the exception thrown will be a FlowInterruptedException rather than an AbortException (and FlowInterruptedException will be thrown even if returnStatus is true), and as a result the script will exit with Result.ABORTED instead of RESULT.FAILURE. Chris Russell's workaround does not handle this case, so the behavior would change in the same way that it would for people not using a workaround. This change also makes timeout}}s of {{sh steps consistent with other steps such as sleep. (ABORTED instead of FAILURE)

            It looks like the workaround from Kai Howelmeyer does not work, because FlowInterruptedException is thrown in the sleeping parallel branch when the pipeline is manually aborted, when the sh step is aborted by a timeout step, or when the sh step fails because of an issue in the script itself, so wasAborted is set to true in all cases and we always rethrow an AbortedException and never execute the block where the sh step's script failed.

            Based on that info, I think it is ok to move forward with the change, but if anyone knows of other common workarounds that may be broken by this change, feel free leave a comment.

            Show
            dnusbaum Devin Nusbaum added a comment - There are 2 main differences that my change would cause: If a sh step is manually aborted, the exception thrown will be a FlowInterruptedException rather than an AbortException (and FlowInterruptedException will be thrown even if returnStatus is true). The script will exit with RESULT.ABORTED both before and after my change. The behavior of Chris Russell 's workaround is unaffected by my change in this case. If a sh step is automatically aborted by a timeout step, the exception thrown will be a FlowInterruptedException rather than an AbortException (and FlowInterruptedException will be thrown even if returnStatus is true), and as a result the script will exit with Result.ABORTED instead of RESULT.FAILURE. Chris Russell 's workaround does not handle this case, so the behavior would change in the same way that it would for people not using a workaround. This change also makes timeout}}s of {{sh steps consistent with other steps such as sleep . (ABORTED instead of FAILURE) It looks like the workaround from Kai Howelmeyer does not work, because FlowInterruptedException is thrown in the sleeping parallel branch when the pipeline is manually aborted, when the sh step is aborted by a timeout step, or when the sh step fails because of an issue in the script itself, so wasAborted is set to true in all cases and we always rethrow an AbortedException and never execute the block where the sh step's script failed. Based on that info, I think it is ok to move forward with the change, but if anyone knows of other common workarounds that may be broken by this change, feel free leave a comment.
            Hide
            haridsv Hari Dara added a comment -

            Chris Russell: I see that your workaround uses instanceof checks, but these are not permitted in pipeline sandbox. Using instanceof generates the below sort of error:

            org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use method java.lang.Class isInstance java.lang.Object 

            I just submitted this PR that whitelists isinstance check: https://github.com/jenkinsci/script-security-plugin/pull/226

            Show
            haridsv Hari Dara added a comment - Chris Russell : I see that your workaround uses instanceof checks, but these are not permitted in pipeline sandbox. Using instanceof generates the below sort of error: org.jenkinsci.plugins.scriptsecurity.sandbox.RejectedAccessException: Scripts not permitted to use method java.lang.Class isInstance java.lang.Object I just submitted this PR that whitelists isinstance  check:  https://github.com/jenkinsci/script-security-plugin/pull/226
            Hide
            dnusbaum Devin Nusbaum added a comment - - edited

            Fixed in Pipeline Nodes and Processes 2.24. See my comment here for an overview of the changes, but in short, externally aborted tasks should throw FlowInterruptedException, while tasks that fail in the script itself should throw AbortException.

            Show
            dnusbaum Devin Nusbaum added a comment - - edited Fixed in Pipeline Nodes and Processes 2.24. See my comment here for an overview of the changes, but in short, externally aborted tasks should throw FlowInterruptedException, while tasks that fail in the script itself should throw AbortException.

              People

              • Assignee:
                dnusbaum Devin Nusbaum
                Reporter:
                jmif Joe Mifsud
              • Votes:
                20 Vote for this issue
                Watchers:
                43 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: