Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36479

Locked resources not freed up by Pipeline job hard kill

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Since LockStepExecution.Callback.finished(context) never gets called in the case of a hard kill, resources can be locked forever when a build is hard killed. It's possible to manually unlock those resources from the UI, but it'd be preferable to have some behavior that detects this scenario and is able to unlock resources locked by defunct builds.

        Attachments

          Issue Links

            Activity

            Hide
            abayer Andrew Bayer added a comment -

            Also this can happen if the build is deleted while running - a non-ideal usage pattern, sure, but since you can do it, people will end up doing it. So we probably also want to check if the build locking a resource actually even exists in the first place and unlock if the build doesn't exist.

            Show
            abayer Andrew Bayer added a comment - Also this can happen if the build is deleted while running - a non-ideal usage pattern, sure, but since you can do it, people will end up doing it. So we probably also want to check if the build locking a resource actually even exists in the first place and unlock if the build doesn't exist.
            Show
            abayer Andrew Bayer added a comment - Very first thoughts on this up at https://github.com/abayer/lockable-resources-plugin/commit/9a0ef2cae5176cef4d5f8439c53b2aad4b6facc0
            Show
            abayer Andrew Bayer added a comment - Continued further with https://github.com/jenkinsci/lockable-resources-plugin/compare/master...abayer:jenkins-36479
            Hide
            abayer Andrew Bayer added a comment -

            Note that this approach doesn't do anything for a scenario where there are already builds queued up for a resource and the build holding that lock gets deleted/hard killed - it only clears that defunct lock when something requests a lock on the resource, though in the X-builds-queued-up scenario, when the X+1th build tries to get a lock, the result will be that the first build in the queue ends up getting a new lock. Still thinking about how to deal with the existing queue without a new lock request...

            Show
            abayer Andrew Bayer added a comment - Note that this approach doesn't do anything for a scenario where there are already builds queued up for a resource and the build holding that lock gets deleted/hard killed - it only clears that defunct lock when something requests a lock on the resource, though in the X-builds-queued-up scenario, when the X+1th build tries to get a lock, the result will be that the first build in the queue ends up getting a new lock. Still thinking about how to deal with the existing queue without a new lock request...
            Hide
            abayer Andrew Bayer added a comment -

            Interesting - turns out LockStepExecution.stop still gets called in the case of a hard kill, and it's not pleased about it -

            INFO: p #1 completed: ABORTED
            [p #1] Hard kill!
            Jul 06, 2016 11:27:29 AM org.jenkins.plugins.lockableresources.LockStepExecution stop
            WARNING: Cannot remove context from lockable resource witing list. The context is not in the waiting list.
            Jul 06, 2016 11:27:29 AM org.jenkinsci.plugins.workflow.cps.CpsStepContext onFailure
            WARNING: already completed CpsStepContext[3]:Owner[p/1:p #1]
            java.lang.IllegalStateException: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:320)
            	at org.jenkins.plugins.lockableresources.LockStepExecution.stop(LockStepExecution.java:90)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:760)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:755)
            	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150)
            	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
            	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
            	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
            	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
            	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:644)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:631)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:568)
            	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
            	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            	at java.lang.Thread.run(Thread.java:744)
            
            Show
            abayer Andrew Bayer added a comment - Interesting - turns out LockStepExecution.stop still gets called in the case of a hard kill, and it's not pleased about it - INFO: p #1 completed: ABORTED [p #1] Hard kill! Jul 06, 2016 11:27:29 AM org.jenkins.plugins.lockableresources.LockStepExecution stop WARNING: Cannot remove context from lockable resource witing list. The context is not in the waiting list. Jul 06, 2016 11:27:29 AM org.jenkinsci.plugins.workflow.cps.CpsStepContext onFailure WARNING: already completed CpsStepContext[3]:Owner[p/1:p #1] java.lang.IllegalStateException: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:320) at org.jenkins.plugins.lockableresources.LockStepExecution.stop(LockStepExecution.java:90) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:760) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:755) at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:644) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:631) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:568) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:744)
            Hide
            abayer Andrew Bayer added a comment -

            Ok, that's just noise - not ideal noise, but just noise. I've pushed a commit with a test that passes now and hangs forever without my changes to LockableResourceManager - that needs to be fixed to actually fail in that scenario.

            More tests still needed for the doDelete scenario and for the builds-already-in-queue scenario, but I just wanted to make sure this actually worked in the base case. =)

            Show
            abayer Andrew Bayer added a comment - Ok, that's just noise - not ideal noise, but just noise. I've pushed a commit with a test that passes now and hangs forever without my changes to LockableResourceManager - that needs to be fixed to actually fail in that scenario. More tests still needed for the doDelete scenario and for the builds-already-in-queue scenario, but I just wanted to make sure this actually worked in the base case. =)
            Show
            abayer Andrew Bayer added a comment - PR up - https://github.com/jenkinsci/lockable-resources-plugin/pull/34
            Hide
            abayer Andrew Bayer added a comment -

            I can't see a way to clear the lock without either waiting for a new lock request to come in (as my PR does) or having an async recurring task running periodically checking every locked resource for defunct locks. LockRunListener doesn't seem to fire on hard kill or while-running build deletion, so far as I can tell from my experiments...

            Show
            abayer Andrew Bayer added a comment - I can't see a way to clear the lock without either waiting for a new lock request to come in (as my PR does) or having an async recurring task running periodically checking every locked resource for defunct locks. LockRunListener doesn't seem to fire on hard kill or while-running build deletion, so far as I can tell from my experiments...
            Hide
            abayer Andrew Bayer added a comment -

            New PR (https://github.com/jenkinsci/lockable-resources-plugin/pull/35) probably supersedes #34 - updating LockRunListener to listen on Run rather than AbstractBuild seems to, well, fix everything!

            Show
            abayer Andrew Bayer added a comment - New PR ( https://github.com/jenkinsci/lockable-resources-plugin/pull/35 ) probably supersedes #34 - updating LockRunListener to listen on Run rather than AbstractBuild seems to, well, fix everything!
            Hide
            abayer Andrew Bayer added a comment -

            Fixed as of next release (presumably 1.10), which should be coming shortly, I think.

            The fix makes LockRunListener fire correctly on Run not just AbstractBuild. That alone did the trick for both hard killed builds and deleted-while-in-progress builds, and doesn't require queuing up a new lock request to clear the defunct lock. Woo.

            Show
            abayer Andrew Bayer added a comment - Fixed as of next release (presumably 1.10), which should be coming shortly, I think. The fix makes LockRunListener fire correctly on Run not just AbstractBuild . That alone did the trick for both hard killed builds and deleted-while-in-progress builds, and doesn't require queuing up a new lock request to clear the defunct lock. Woo.
            Hide
            aheritier Arnaud Héritier added a comment -

            Cool Andrew Bayer
            Could we also add some documentations/samples about it. It's always not clear for me how this feature should be used and what is its interest.

            Thx

            Show
            aheritier Arnaud Héritier added a comment - Cool Andrew Bayer Could we also add some documentations/samples about it. It's always not clear for me how this feature should be used and what is its interest. Thx
            Hide
            abayer Andrew Bayer added a comment -

            Talk to Antonio Muñiz in re docs/samples - I honestly don't know much about how to use the plugin, I just decided to fix the bug. =)

            Show
            abayer Andrew Bayer added a comment - Talk to Antonio Muñiz in re docs/samples - I honestly don't know much about how to use the plugin, I just decided to fix the bug. =)

              People

              • Assignee:
                abayer Andrew Bayer
                Reporter:
                abayer Andrew Bayer
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: