Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45107

NullPointerException in lockable-resources when resuming pipeline after restart, with a lock statement which was opened pre-restart

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Labels:
      None
    • Environment:
      Jenkins LTS 2.46.3, lockable-resources 2.0
    • Similar Issues:

      Description

      I have to caveat this upfront that I have two Jenkins masters, both on the same LTS version, with the same version of lockable-resources and the same versions of all the pipeline plugins (latest as available to 2.46.3 today, please ask if you need versions), and this bug only occurs on one of them, but I can't see why.

      Script used to repro:

      def builders = [:]
      
      def stages = [
      	[name:'first', lockName:'lock-first'],
      	[name:'second', lockName:'lock-second'],
      	[name:'first-two', lockName:'lock-first'],
      	[name:'second-two', lockName:'lock-second']
      ]
      
      stages.each { stage ->
      	builders[stage.name] = {
      		lock(stage.lockName) {
                  node {
      				commonBuildCommand()
      			}
      		}
      	}
      }
      
      parallel builders
      
      def commonBuildCommand() {
      	sleep 10
      	sleep 10
      }
      

      When I run this script (and restart during it, as per the log), I will see the following output:

      [Pipeline] parallel
      [Pipeline] [first] { (Branch: first)
      [Pipeline] [second] { (Branch: second)
      [Pipeline] [first-two] { (Branch: first-two)
      [Pipeline] [second-two] { (Branch: second-two)
      [Pipeline] [first] lock
      [first] Trying to acquire lock on [lock-first]
      [first] Lock acquired on [lock-first]
      [Pipeline] [first] {
      [Pipeline] [second] lock
      [second] Trying to acquire lock on [lock-second]
      [second] Lock acquired on [lock-second]
      [Pipeline] [second] {
      [Pipeline] [first-two] lock
      [first-two] Trying to acquire lock on [lock-first]
      [first-two] Found 0 available resource(s). Waiting for correct amount: 1.
      [first-two] [lock-first] is locked, waiting...
      [Pipeline] [second-two] lock
      [second-two] Trying to acquire lock on [lock-second]
      [second-two] Found 0 available resource(s). Waiting for correct amount: 1.
      [second-two] [lock-second] is locked, waiting...
      [Pipeline] [first] node
      [first] Running on <node> in <workspace>/Test_Restart
      [Pipeline] [second] node
      [second] Running on <node> in <workspace>/Test_Restart@2
      [Pipeline] [first] {
      [Pipeline] [second] {
      [Pipeline] [first] sleep
      [first] Sleeping for 10 sec
      [Pipeline] [second] sleep
      [second] Sleeping for 10 sec
      Resuming build at Fri Jun 23 15:15:52 BST 2017 after Jenkins restart
      Waiting to resume part of Experiments » Test_Restart #24: ???
      Ready to run at Fri Jun 23 15:15:53 BST 2017
      [Pipeline] [first] sleep
      [first] Sleeping for 10 sec
      [Pipeline] [second] sleep
      [second] Sleeping for 10 sec
      [Pipeline] [first] }
      [Pipeline] [first] // node
      [Pipeline] [first] }
      [first] Lock released on resource [lock-first]
      [Pipeline] [second] }
      [Pipeline] [first] // lock
      [Pipeline] [second] // node
      [Pipeline] [first] }
      [Pipeline] [second] }
      [second] Lock released on resource [lock-second]
      [Pipeline] [second] // lock
      [Pipeline] [second] }
      

      The script will now stall at this point. The following will be output to the Jenkins log at this time (when FINE logging is available for lockable-resources):

      Jun 23, 2017 3:16:03 PM WARNING org.jenkins.plugins.lockableresources.LockableResourcesManager unlockNames
      Skipping queued context for lock. Can not get the Run object from the context to proceed with lock, this could be a legitimate status if the build waiting for the lock was deleted or hard killed. More information at Level.FINE if debug is needed.
      Jun 23, 2017 3:16:03 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager
      Can not get the Run object from the context to proceed with lock
      java.lang.NullPointerException
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:472)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:243)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61)
      	at org.jenkins.plugins.lockableresources.LockableResourcesManager.unlockNames(LockableResourcesManager.java:387)
      	at org.jenkins.plugins.lockableresources.LockStepExecution$Callback.finished(LockStepExecution.java:84)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362)
      	at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:165)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      Jun 23, 2017 3:16:03 PM WARNING org.jenkins.plugins.lockableresources.LockableResourcesManager unlockNames
      Skipping queued context for lock. Can not get the Run object from the context to proceed with lock, this could be a legitimate status if the build waiting for the lock was deleted or hard killed. More information at Level.FINE if debug is needed.
      Jun 23, 2017 3:16:03 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager
      Can not get the Run object from the context to proceed with lock
      java.lang.NullPointerException
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:472)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:243)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61)
      	at org.jenkins.plugins.lockableresources.LockableResourcesManager.unlockNames(LockableResourcesManager.java:387)
      	at org.jenkins.plugins.lockableresources.LockStepExecution$Callback.finished(LockStepExecution.java:84)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362)
      	at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:165)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      When I then abort the job I'll get this output in the Jenkins log:

      Jun 23, 2017 3:16:25 PM WARNING org.jenkins.plugins.lockableresources.LockStepExecution stop
      Cannot remove context from lockable resource witing list. The context is not in the waiting list.
      Jun 23, 2017 3:16:25 PM WARNING org.jenkins.plugins.lockableresources.LockStepExecution stop
      Cannot remove context from lockable resource witing list. The context is not in the waiting list.
      

      And this in the script:

      Aborted by Phil McArdle (phil.mcardle)
      [Pipeline] [second-two] // lock
      [Pipeline] [first-two] // lock
      [Pipeline] [first-two] }
      [first-two] Failed in branch first-two
      [Pipeline] [second-two] }
      [second-two] Failed in branch second-two
      [Pipeline] // parallel
      [Pipeline] End of Pipeline
      Exception: null
      org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.cancel(CpsBodyExecution.java:242)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecution.cancel(BodyExecution.java:76)
      	at org.jenkinsci.plugins.workflow.cps.steps.ParallelStepExecution.stop(ParallelStepExecution.java:66)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:296)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:966)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:955)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:744)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      Finished: ABORTED
      

      There are no other warnings or errors in the Jenkins log throughout this, but immediately after the above, it will echo:

      Jun 23, 2017 3:22:41 PM INFO org.jenkinsci.plugins.workflow.job.WorkflowRun finish
      Experiments/Test_Restart #26 completed: ABORTED
      Jun 23, 2017 3:22:41 PM WARNING org.jenkinsci.plugins.workflow.flow.FlowExecutionList unregister
      Owner[Experiments/Test_Restart/26:Experiments/Test_Restart #26] was not in the list to begin with: []
      

      I'm not sure if there's anything relevant there, but including just in case.

        Attachments

          Activity

          Hide
          philmcardlecg Phil McArdle added a comment - - edited

          I'm aware that the repro in my script might seem a little odd - I'm not sure if anyone else uses this exact form - I'm using this in another script to stack up multiple parallel branches that will all queue on shared lockable resources.

          I suspect that I could trigger this issue in other ways, e.g. having the locks reserved by other jobs, but I haven't hit them yet.

          Show
          philmcardlecg Phil McArdle added a comment - - edited I'm aware that the repro in my script might seem a little odd - I'm not sure if anyone else uses this exact form - I'm using this in another script to stack up multiple parallel branches that will all queue on shared lockable resources. I suspect that I could trigger this issue in other ways, e.g. having the locks reserved by other jobs, but I haven't hit them yet.
          Hide
          philmcardlecg Phil McArdle added a comment -

          Okay, I pulled the latest LTS from Docker and tried the same script, and it doesn't error, so I'll assume it's my instance. Will upgrade to latest LTS first, and then rebuild from scratch if necessary.

          Show
          philmcardlecg Phil McArdle added a comment - Okay, I pulled the latest LTS from Docker and tried the same script, and it doesn't error, so I'll assume it's my instance. Will upgrade to latest LTS first, and then rebuild from scratch if necessary.
          Hide
          philmcardlecg Phil McArdle added a comment -

          Closing this. I've completely rebuilt this instance from scratch and can't reproduce.

          Show
          philmcardlecg Phil McArdle added a comment - Closing this. I've completely rebuilt this instance from scratch and can't reproduce.

            People

            • Assignee:
              Unassigned
              Reporter:
              philmcardlecg Phil McArdle
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: