Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39552

After restart, interrupted pipeline deadlocks waiting for executor

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I had a pipeline build running, and then restarted Jenkins. After coming up again, I had this in the log for one of the parallel steps in the build:

      Resuming build at Mon Nov 07 13:11:05 CET 2016 after Jenkins restart
      Waiting to resume part of Atlassian Bitbucket » honey » master #4: ???
      Waiting to resume part of Atlassian Bitbucket » honey » master #4: Waiting for next available executor on bcubuntu32

      And the last message repeating every few minutes. The slave bcubuntu32 has only one executor, and it seems like this executor was "used up" for this task of waiting for an available executor...

      After I went into the configuration and changed number of executors to 2, the build continued as normal.

      A possibly related issue: Before restart, I put Jenkins in quiet mode, but the same build agent hung at the end of the pipeline part that was running, never finishing the build. In the end I made the restart without waiting for the part to finish.

      How to reproduce

      • In a fresh Jenkins instance, set master executors number to 1
      • Create job-1 and job-2 as follow
        node {
            parallel "parallel-1": {
                sh "true"
            }, "parallel-2": {
                sh "true"
            }
        }
        build 'job-2'
        
        node {
            sh "sleep 300"
        }
        

      Start a build, wait for job-2 node block to start, then restart Jenkins.

      When it comes back online, you'll see a deadlock

      It seems job-1 is trying to come back on the node it used before the restart, even though its current state doesn't require any node.

        Attachments

          Issue Links

            Activity

            Hide
            mkobit Mike Kobit added a comment -

            We see this frequently, and are very concerned with the survivability of pipeline jobs. Jenkins is rendered unusable for some reason (possibly due to nodes disappearing underneath them?). We see builds in the queue with the ??? and have no idea how to resolve the issues.

            Show
            mkobit Mike Kobit added a comment - We see this frequently, and are very concerned with the survivability of pipeline jobs. Jenkins is rendered unusable for some reason (possibly due to nodes disappearing underneath them?). We see builds in the queue with the ??? and have no idea how to resolve the issues.
            Hide
            mkobit Mike Kobit added a comment -

            In our build logs in multiple places:

            java.io.IOException: bitbucket_projects/dp/read-api/PR-235 #1 did not yet start
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:884)
                    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:65)
                    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:57)
                    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
                    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
                    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:178)
                    at jenkins.model.Jenkins.<init>(Jenkins.java:997)
                    at hudson.model.Hudson.<init>(Hudson.java:86)
                    at hudson.model.Hudson.<init>(Hudson.java:82)
                    at hudson.WebAppMain$3.run(WebAppMain.java:235)
            
            Show
            mkobit Mike Kobit added a comment - In our build logs in multiple places: java.io.IOException: bitbucket_projects/dp/read-api/PR-235 #1 did not yet start at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:884) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:65) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:57) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:178) at jenkins.model.Jenkins.<init>(Jenkins.java:997) at hudson.model.Hudson.<init>(Hudson.java:86) at hudson.model.Hudson.<init>(Hudson.java:82) at hudson.WebAppMain$3.run(WebAppMain.java:235)
            Hide
            mkobit Mike Kobit added a comment -

            From thread dump

            "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#24]" Id=647 Group=main WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1
            	at sun.misc.Unsafe.park(Native Method)
            	-  waiting on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1
            	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
            	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
            	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294)
            	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61)
            	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getNode(ExecutorStepExecution.java:259)
            	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.categoriesForPipeline(ThrottleQueueTaskDispatcher.java:411)
            	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:168)
            	at hudson.model.Queue.isBuildBlocked(Queue.java:1184)
            	at hudson.model.Queue.maintain(Queue.java:1505)
            	at hudson.model.Queue$1.call(Queue.java:320)
            	at hudson.model.Queue$1.call(Queue.java:317)
            	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108)
            	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98)
            	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
            	at java.lang.Thread.run(Thread.java:745)
            
            	Number of locked synchronizers = 1
            	- java.util.concurrent.locks.ReentrantLock$NonfairSync@5613fb44
            
            Show
            mkobit Mike Kobit added a comment - From thread dump "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#24]" Id=647 Group=main WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1 at sun.misc.Unsafe.park(Native Method) - waiting on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getNode(ExecutorStepExecution.java:259) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.categoriesForPipeline(ThrottleQueueTaskDispatcher.java:411) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:168) at hudson.model.Queue.isBuildBlocked(Queue.java:1184) at hudson.model.Queue.maintain(Queue.java:1505) at hudson.model.Queue$1.call(Queue.java:320) at hudson.model.Queue$1.call(Queue.java:317) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110) at java.lang. Thread .run( Thread .java:745) Number of locked synchronizers = 1 - java.util.concurrent.locks.ReentrantLock$NonfairSync@5613fb44
            Hide
            abayer Andrew Bayer added a comment -

            Mike Kobit - that sounds like JENKINS-44747, fyi. This issue here predates the change in Throttle Concurrent Builds, so is probably caused by something else.

            Show
            abayer Andrew Bayer added a comment - Mike Kobit - that sounds like JENKINS-44747 , fyi. This issue here predates the change in Throttle Concurrent Builds, so is probably caused by something else.
            Hide
            mkobit Mike Kobit added a comment -

            Thanks Andrew Bayer - I'll follow that issue.

            I'm starting to think that my issue may be a different. We saw a lot of weirdness with Jenkins restarts and lots of LinkageError from a few user pipelines, and they add a bunch of load statements (some nested) and reloading the same resources that may have caused our issue. Still unsure, but haven't seen it happen again since we fixed it in the last day.

            Show
            mkobit Mike Kobit added a comment - Thanks Andrew Bayer - I'll follow that issue. I'm starting to think that my issue may be a different. We saw a lot of weirdness with Jenkins restarts and lots of LinkageError from a few user pipelines, and they add a bunch of load statements (some nested) and reloading the same resources that may have caused our issue. Still unsure, but haven't seen it happen again since we fixed it in the last day.

              People

              • Assignee:
                Unassigned
                Reporter:
                estyrke Emil Styrke
              • Votes:
                10 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated: