Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46248

Deadlock in queue maintenance + node removal

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Got a deadlock while developing a cloud plugin  I was attempting to have the plugin delete nodes as soon as they finish a task, in taskCompleted. Specifically in this case, the following happened:

      1. Cloud plugin started provisioning, added new nodes to Jenkins, and returned callable PlannedNode
      2. Node connected immediately upon adding node
      3. Job was scheduled and started running
      4. Job was extremely fast (just a pipeline node with hello world echo
      5. taskCompleted called
      6. Node removed in taskCompleted.

      My hunch here is that this has little to do with the cloud plugin, and more to do with simply having a job that executes extremely quickly with the addition of node removal in taskCompleted.

      WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#64] locked on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@545ee67e (owned by Executor #0 for Windows.10.Jenkins.Amd64-0816090454259-0 : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=helix-agents-test#23,label=Windows.10.Jenkins.Amd64,context=CpsStepContext[11:node]:Owner[helix-agents-test/23:helix-agents-test #23],cookie=null}):
      at sun.misc.Unsafe.park(Native Method)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
      at hudson.model.Executor.isParking(Executor.java:640)
      at hudson.model.Queue.maintain(Queue.java:1442)
      at hudson.model.Queue$1.call(Queue.java:321)
      at hudson.model.Queue$1.call(Queue.java:318)
      at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108)
      at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98)
      at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
      at java.lang.Thread.run(Thread.java:748)
      , Executor #0 for Windows.10.Jenkins.Amd64-0816090454259-0 : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=helix-agents-test#23,label=Windows.10.Jenkins.Amd64,context=CpsStepContext[11:node]:Owner[helix-agents-test/23:helix-agents-test #23],cookie=null} locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@5a990285 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#64]):
      at sun.misc.Unsafe.park(Native Method)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      at hudson.model.Queue._withLock(Queue.java:1340)
      at hudson.model.Queue.withLock(Queue.java:1219)
      at jenkins.model.Nodes.removeNode(Nodes.java:237)
      at jenkins.model.Jenkins.removeNode(Jenkins.java:2123)
      at com.microsoft.helix.helixagents.HelixComputer.taskCompleted(HelixComputer.java:69)
      at hudson.model.queue.WorkUnitContext.synchronizeEnd(WorkUnitContext.java:140)
      at hudson.model.Executor.finish1(Executor.java:451)
      at hudson.model.Executor.completedAsynchronous(Executor.java:473)
      at jenkins.model.queue.AsynchronousExecution.setExecutor(AsynchronousExecution.java:115)
      at hudson.model.Executor.run(Executor.java:409)
      

       

        Attachments

          Issue Links

            Activity

            Hide
            pavgust Pavel Avgustinov added a comment -

            I believe PR#3354 might fix this – at least the discussion of what goes wrong is relevant.

            Show
            pavgust Pavel Avgustinov added a comment - I believe PR#3354  might fix this – at least the discussion of what goes wrong is relevant.
            Hide
            grzzie Grzegorz Zieba added a comment -

            Got the same on docker-plugin

            May 07, 2018 10:55:54 AM jenkins.metrics.api.Metrics$HealthChecker execute 
            WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [WorkflowRun.copyLogs [#3] (maxwell-patch-build #2889) locked on org.jenkinsci.plugins.workflow.cps.CpsFlowExecution@70545873 (owned by Running CpsFlowExecution[O
            wner[maxwell-patch-build/2889:maxwell-patch-build #2889]]): 
                    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$ConverterImpl.marshal(CpsFlowExecution.java:1602) 
                    at hudson.util.XStream2$AssociatedConverterImpl.marshal(XStream2.java:370) 
                    at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) 
                    at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) 
                    at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84) 
                    at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:265) 
                    at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:252) 
                    at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224) 
                    at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138) 
                    at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209) 
                    at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150) 
                    at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) 
                    at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) 
                    at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) 
                    at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82) 
                    at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37) 
                    at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026) 
                    at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015) 
                    at com.thoughtworks.xstream.XStream.toXML(XStream.java:988) 
                    at hudson.XmlFile.write(XmlFile.java:181) 
                    at org.jenkinsci.plugins.workflow.support.PipelineIOUtils.writeByXStream(PipelineIOUtils.java:30) 
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.save(WorkflowRun.java:1256) 
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.saveWithoutFailing(WorkflowRun.java:1236) 
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:612) 
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$500(WorkflowRun.java:144) 
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:410) 
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
                    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) 
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) 
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
                    at java.lang.Thread.run(Thread.java:748)
            
            
            Show
            grzzie Grzegorz Zieba added a comment - Got the same on docker-plugin May 07, 2018 10:55:54 AM jenkins.metrics.api.Metrics$HealthChecker execute WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [WorkflowRun.copyLogs [#3] (maxwell-patch-build #2889) locked on org.jenkinsci.plugins.workflow.cps.CpsFlowExecution@70545873 (owned by Running CpsFlowExecution[O wner[maxwell-patch-build/2889:maxwell-patch-build #2889]]):         at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$ConverterImpl.marshal(CpsFlowExecution.java:1602)         at hudson.util.XStream2$AssociatedConverterImpl.marshal(XStream2.java:370)         at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)         at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)         at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84)         at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:265)         at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:252)         at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224)         at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138)         at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209)         at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150)         at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69)         at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58)         at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43)         at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82)         at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)         at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026)         at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015)         at com.thoughtworks.xstream.XStream.toXML(XStream.java:988)         at hudson.XmlFile.write(XmlFile.java:181)         at org.jenkinsci.plugins.workflow.support.PipelineIOUtils.writeByXStream(PipelineIOUtils.java:30)         at org.jenkinsci.plugins.workflow.job.WorkflowRun.save(WorkflowRun.java:1256)         at org.jenkinsci.plugins.workflow.job.WorkflowRun.saveWithoutFailing(WorkflowRun.java:1236)         at org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:612)         at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$500(WorkflowRun.java:144)         at org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:410)         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)         at java.lang.Thread.run(Thread.java:748)
            Hide
            grzzie Grzegorz Zieba added a comment -

            Looks like this problem occurs on:

            • Jenkins - 2.89.4
            • Pipeline: Groovy - 2.52
            • Pipeline: Job - 2.21

            Downgrading to:

            • Pipeline: Groovy - 2.49
            • Pipeline: Job - 2.20

            fix the problem.

            Show
            grzzie Grzegorz Zieba added a comment - Looks like this problem occurs on: Jenkins - 2.89.4 Pipeline: Groovy - 2.52 Pipeline: Job - 2.21 Downgrading to: Pipeline: Groovy - 2.49 Pipeline: Job - 2.20 fix the problem.
            Hide
            iceiceice Alexey Grigorov added a comment -

            I have same problem, deadlocks started to appear daily

            Show
            iceiceice Alexey Grigorov added a comment - I have same problem, deadlocks started to appear daily
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Pavel Avgustinov
            Path:
            core/src/main/java/hudson/model/Executor.java
            core/src/main/java/jenkins/model/queue/AsynchronousExecution.java
            http://jenkins-ci.org/commit/jenkins/01b1f1ddbcad7ed8ea781e41ba4bd8d890f13c67
            Log:
            Fix potential deadlock between queue maintenance and asynchronous execution (#3354)

            JENKINS-46248 - Fix potential deadlock between queue maintenance and asynchronous execution

            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Pavel Avgustinov Path: core/src/main/java/hudson/model/Executor.java core/src/main/java/jenkins/model/queue/AsynchronousExecution.java http://jenkins-ci.org/commit/jenkins/01b1f1ddbcad7ed8ea781e41ba4bd8d890f13c67 Log: Fix potential deadlock between queue maintenance and asynchronous execution (#3354) JENKINS-46248 - Fix potential deadlock between queue maintenance and asynchronous execution * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Fixed in Jenkins 2.127. It changes some Restricted API, but we should doublecheck that the API was not non-restricted before

            Show
            oleg_nenashev Oleg Nenashev added a comment - Fixed in Jenkins 2.127. It changes some Restricted API, but we should doublecheck that the API was not non-restricted before

              People

              • Assignee:
                pavgust Pavel Avgustinov
                Reporter:
                mmitche Matthew Mitchell
              • Votes:
                6 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: