Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22549

JCloudsRetentionStrategy.check deadlocks with JCloudsCloud.provision

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Blocker Blocker
    • jclouds-plugin
    • jclouds working with an openstack

      We observe a deadlock in JClouds plugin that happens in a following scenarion:

      1. There is a thread (thread A) that performs deployment of a new slave
      2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
      3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
      4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
      5. addNode is synchronized on Hudson instance and locks it
      6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
      7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

      The stack trace of thead A is therefore as follows:

      Name: Computer.threadPoolForRemoting [#135]
      State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer [#8]
      Total blocked: 8  Total waited: 1
      
      Stack trace: 
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
      hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
      hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
      hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
         - locked java.lang.Object@5a4034b5
      jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
      jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
      jenkins.model.Jenkins.addNode(Jenkins.java:1693)
         - locked hudson.model.Hudson@32da12ad
      jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
      jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
      java.util.concurrent.FutureTask.run(FutureTask.java:262)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:744)
      

      The stack trace of thread B is as follows:

      Name: jenkins.util.Timer [#8]
      State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting [#135]
      Total blocked: 2  Total waited: 274
      
      Stack trace: 
      jenkins.model.Jenkins.save(Jenkins.java:2672)
      hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
      hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)
         - locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
      hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
      hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:744)
      

      This actually causes deadly problems during normal operation of JClouds under somewhat high load.

            abayer Andrew Bayer
            pupssman Ivan Kalinin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: