Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44838

Cancellation of caller fails to abort Pipeline job

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I have a Pipeline job which invokes other pipeline jobs (in parallel) with failFast enabled.  Generally, failFast works as expected, however I constantly see an issue where a handful of the downstream builds fail to abort properly.

      In the cases which fail to abort properly, they are always in the very early stages of execution, and almost always while performing a checkout() operation (with GitSCM as the underlying class instantiated).

      Pipeline job definition for the downstream jobs (which fail to be aborted) looks like:

      node ("slave") {
        try {
          currentBuild.displayName = '[' + params.TARGET_NAME + ' #' + currentBuild.id + ']'
          stage ("Checkout") {
            checkout(
              poll: false,
              scm: [
                $class: 'GitSCM',
                branches: [[name: "${GERRIT_BRANCH}"]],
                doGenerateSubmoduleConfigurations: false,
                extensions: [
                  [
                    $class: 'CloneOption',
                    depth: 100,
                    honorRefspec: true,
                    noTags: true,
                    reference: '/path/to/reference.git',
                    shallow: true
                  ],
                  [$class: 'WipeWorkspace'],
                  [$class: 'BuildChooserSetting', buildChooser: [$class: 'GerritTriggerBuildChooser']],
                  [
                    $class: 'SubmoduleOption',
                    disableSubmodules: false,
                    parentCredentials: true,
                    recursiveSubmodules: false,
                    reference: '/path/to/other/reference.git',
                    trackingSubmodules: false
                  ]
                ],
                submoduleCfg: [],
                userRemoteConfigs: [[refspec: "${GERRIT_REFSPEC}", url: 'ssh://url/repo.git']]
              ]
            )
            /* rest of build flow */
          }
        } catch (err) {
          echo "Caught: ${err}"
          if( special_condition ) {
             currentBuild.result = 'NOT_BUILT'
          }
        }
      }

       

       

      Console output from such a case does indeed show that the calling pipeline was cancelled mid-checkout:

      [Pipeline] node
      Still waiting to schedule task
      Waiting for next available executor on slave        
      Running on slave-host-01 in /jenkins/workspace/target-builder
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Checkout)
      [Pipeline] checkout
      Wiping out workspace first.
      Cloning the remote Git repository
      Using shallow clone
      shallow clone depth 100
      Avoid fetching tags
      Honoring refspec on initial clone
      Cloning repository ssh://url/repo.git
       > git init /jenkins/workspace/target-builder # timeout=10
      Fetching upstream changes from ssh://url/repo.git
       > git --version # timeout=10
       > git fetch --no-tags --progress ssh://url/repo.git refs/changes/79/79/21 --depth=100
       > git config remote.origin.url ssh://url/repo.git # timeout=10
       > git config --add remote.origin.fetch refs/changes/79/79/21 # timeout=10
       > git config remote.origin.url ssh://url/repo.git # timeout=10
      Fetching upstream changes from ssh://url/repo.git
       > git fetch --no-tags --progress ssh://url/repo.git refs/changes/79/79/21 --depth=100
      Calling Pipeline was cancelled
       > git rev-parse FETCH_HEAD^{commit} # timeout=10
      Checking out Revision 76af08cb44e09ce0408eb15324c4397dcb7a8768 (master)
       > git config core.sparsecheckout # timeout=10
       > git checkout -f 76af08cb44e09ce0408eb15324c4397dcb7a8768
      Calling Pipeline was cancelled
      Click here to forcibly terminate running steps
       > git rev-parse FETCH_HEAD^{commit} # timeout=10
       > git rev-list 172b9158257970f1631e070148db701ac4f5a587 # timeout=10
       > git remote # timeout=10
       > git submodule init # timeout=10
       > git submodule sync # timeout=10
      Click here to forcibly terminate running steps
       > git config --get remote.origin.url # timeout=10
       > git submodule init # timeout=10
       > git config -f .gitmodules --get-regexp ^submodule\.(.*)\.url # timeout=10
       > git config --get submodule.bar.url # timeout=10
       > git remote # timeout=10
       > git config --get remote.origin.url # timeout=10
       > git config -f .gitmodules --get submodule.bar.path # timeout=10
       > git submodule update --reference /path/to/other/reference.git bar
      
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] stage
      [Pipeline] { (Build)
      [Pipeline] sh
      [...]

      However, the 'sh' step gets executed (in full! even though the calling pipeline was cancelled mid-checkout), and the build (eventually) finishes (with the correct Result – ABORTED):

      [Pipeline] End of Pipeline
      [BFA] Scanning build for known causes...
      [BFA] No failure causes found
      [BFA] Done. 0s
      Finished: ABORTED

        Attachments

          Activity

          Hide
          jglick Jesse Glick added a comment -

          Seems like SCMStep.StepExecutionImpl.stop is not working, perhaps because GitSCM.checkout did not respond to thread interruption. Possibly SynchronousNonBlockingStepExecution.stop needs to record the fact that an interrupt was thrown, and prevent the step from returning normally. Would be helpful if it were known how to reproduce from scratch, since then I could see in a debugger where the thread interrupt was being swallowed.

          Show
          jglick Jesse Glick added a comment - Seems like SCMStep.StepExecutionImpl.stop is not working, perhaps because GitSCM.checkout did not respond to thread interruption. Possibly SynchronousNonBlockingStepExecution.stop needs to record the fact that an interrupt was thrown, and prevent the step from returning normally. Would be helpful if it were known how to reproduce from scratch, since then I could see in a debugger where the thread interrupt was being swallowed.

            People

            • Assignee:
              Unassigned
              Reporter:
              tskrainar Tom Skrainar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: