Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56986

Deadlock on EC2 resources and build queue

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Duplicate
    • Component/s: ec2-plugin
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      This is either the same as or is related to JENKINS-53858. Feel free to close this as a duplicate and re-open JENKINS-53858 if it is the same.

      Environment

      I'm using a slightly modified version of the EC2 plugin specifically https://github.com/sgleske-ias/ec2-plugin/tree/ias-internal-2 ; it was built from EC2 plugin 1.42-SNAPSHOT before 1.42 was released but after https://github.com/jenkinsci/ec2-plugin/commit/2f3a04a2d3ce0e51a755792b9d03b4fff4ebe9b3 was merged. So my custom version includes the deadlock fix for JENKINS-53858. (CORRECTION my version did not include the fix from JENKINS-53858)

      Deadlock behavior

      During the deadlock the web UI was responsive. The deadlock blocked:

      • New items being queued (such as build events submitted through webhooks).
      • Autoscaling provisioning of new EC2 agents was blocked.
      • I was not able to delete the 1 EC2 agent that was provisioned but marked as offline because it was deadlocked.

      There was a "jenkins.util.Timer [#8]" thread in which most actions in my Jenkins instance were blocked. This was for the cloud provisioner. Most items that were blocked work blocked by this thread. "jenkins.util.Timer [#8]" thread was blocked by "jenkins.util.Timer [#5]" thread

      "jenkins.util.Timer [#5]" thread was blocked by waiting on "jenkins.util.Timer [#8]" and visa versa.

      My hypothesis

      I believe they were blocked by the combination of the Queue lock and the EC2Cloud lock. Each needed both and was waiting on the other.

      jenkins.util.Timer [#8] Thread Dump

      jenkins.util.Timer [#8]
        at hudson.plugins.ec2.EC2Cloud.connect()Lcom/amazonaws/services/ec2/AmazonEC2; (EC2Cloud.java:748)
        at hudson.plugins.ec2.CloudHelper.getInstance(Ljava/lang/String;Lhudson/plugins/ec2/EC2Cloud;)Lcom/amazonaws/services/ec2/model/Instance; (CloudHelper.java:47)
        at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(Ljava/lang/String;Lhudson/plugins/ec2/EC2Cloud;)Lcom/amazonaws/services/ec2/model/Instance; (CloudHelper.java:25)
        at hudson.plugins.ec2.EC2Computer.getState()Lhudson/plugins/ec2/InstanceState; (EC2Computer.java:127)
        at hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(Lhudson/plugins/ec2/EC2Computer;)J (EC2RetentionStrategy.java:112)
        at hudson.plugins.ec2.EC2RetentionStrategy.check(Lhudson/plugins/ec2/EC2Computer;)J (EC2RetentionStrategy.java:90)
        at hudson.plugins.ec2.EC2RetentionStrategy.check(Lhudson/model/Computer;)J (EC2RetentionStrategy.java:48)
        at hudson.slaves.ComputerRetentionWork$1.run()V (ComputerRetentionWork.java:72)
        at hudson.model.Queue._withLock(Ljava/lang/Runnable;)V (Queue.java:1381)
        at hudson.model.Queue.withLock(Ljava/lang/Runnable;)V (Queue.java:1258)
        at hudson.slaves.ComputerRetentionWork.doRun()V (ComputerRetentionWork.java:63)
        at hudson.triggers.SafeTimerTask.run()V (SafeTimerTask.java:72)
        at jenkins.security.ImpersonatingScheduledExecutorService$1.run()V (ImpersonatingScheduledExecutorService.java:58)
        at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624)
        at java.lang.Thread.run()V (Thread.java:748)
      

      jenkins.util.Timer [#5] Thread Dump

      jenkins.util.Timer [#5]
        at sun.misc.Unsafe.park(ZJ)V (Native Method)
        at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z (AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;I)Z (AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(I)V (AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()V (ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock()V (ReentrantLock.java:285)
        at hudson.model.Queue._withLock(Ljava/util/concurrent/Callable;)Ljava/lang/Object; (Queue.java:1438)
        at hudson.model.Queue.withLock(Ljava/util/concurrent/Callable;)Ljava/lang/Object; (Queue.java:1301)
        at jenkins.model.Nodes.updateNode(Lhudson/model/Node;)Z (Nodes.java:193)
        at jenkins.model.Jenkins.updateNode(Lhudson/model/Node;)Z (Jenkins.java:2095)
        at hudson.model.Node.save()V (Node.java:140)
        at hudson.util.PersistedList.onModified()V (PersistedList.java:173)
        at hudson.util.PersistedList.replaceBy(Ljava/util/Collection;)V (PersistedList.java:85)
        at hudson.model.Slave.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ILhudson/model/Node$Mode;Ljava/lang/String;Lhudson/slaves/ComputerLauncher;Lhudson/slaves/RetentionStrategy;Ljava/util/List;)V (Slave.java:198)
        at hudson.plugins.ec2.EC2AbstractSlave.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ILhudson/model/Node$Mode;Ljava/lang/String;Lhudson/slaves/ComputerLauncher;Lhudson/slaves/RetentionStrategy;Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/lang/String;Ljava/lang/String;ZLjava/lang/String;Ljava/util/List;Ljava/lang/String;ZZILhudson/plugins/ec2/AMITypeData;)V (EC2AbstractSlave.java:138)
        at hudson.plugins.ec2.EC2OndemandSlave.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;Lhudson/model/Node$Mode;Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/lang/String;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/lang/String;ZZILhudson/plugins/ec2/AMITypeData;)V (EC2OndemandSlave.java:49)
        at hudson.plugins.ec2.EC2OndemandSlave.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;Lhudson/model/Node$Mode;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/util/List;Ljava/lang/String;ZZILhudson/plugins/ec2/AMITypeData;)V (EC2OndemandSlave.java:42)
        at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(Lcom/amazonaws/services/ec2/model/Instance;)Lhudson/plugins/ec2/EC2OndemandSlave; (SlaveTemplate.java:963)
        at hudson.plugins.ec2.SlaveTemplate.toSlaves(Ljava/util/List;)Ljava/util/List; (SlaveTemplate.java:660)
        at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(ILjava/util/EnumSet;)Ljava/util/List; (SlaveTemplate.java:632)
        at hudson.plugins.ec2.SlaveTemplate.provision(ILjava/util/EnumSet;)Ljava/util/List; (SlaveTemplate.java:463)
        at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(Lhudson/plugins/ec2/SlaveTemplate;IZ)Ljava/util/List; (EC2Cloud.java:587)
        at hudson.plugins.ec2.EC2Cloud.provision(Lhudson/model/Label;I)Ljava/util/Collection; (EC2Cloud.java:602)
        at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(Lhudson/slaves/NodeProvisioner$StrategyState;)Lhudson/slaves/NodeProvisioner$StrategyDecision; (NodeProvisioner.java:715)
        at hudson.slaves.NodeProvisioner.update()V (NodeProvisioner.java:320)
        at hudson.slaves.NodeProvisioner.access$000(Lhudson/slaves/NodeProvisioner;)V (NodeProvisioner.java:61)
        at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun()V (NodeProvisioner.java:809)
        at hudson.triggers.SafeTimerTask.run()V (SafeTimerTask.java:72)
        at jenkins.security.ImpersonatingScheduledExecutorService$1.run()V (ImpersonatingScheduledExecutorService.java:58)
        at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:624)
        at java.lang.Thread.run()V (Thread.java:748)
      

        Attachments

          Issue Links

            Activity

            Hide
            sag47 Sam Gleske added a comment -

            I misspoke. My compiled plugin does not include the deadlock fix.

            Show
            sag47 Sam Gleske added a comment - I misspoke. My compiled plugin does not include the deadlock fix.

              People

              • Assignee:
                thoulen FABRIZIO MANFREDI
                Reporter:
                sag47 Sam Gleske
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: