Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48161

Deadlock caused by synchronized methods in EC2Cloud

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • ec2-plugin
    • None

      This bug is with 1.37 plugin

      EC2Cloud.java has several synchronized methods that can be called from various timers. Our installation heavily utilizes the spot market and we have a high number of nodes in our fleet.

      Under load you can easily get into a situation where one thread is terminating an instance and at the same time another is trying to provision a new one.

      In this case we have a lock when: a thread is trying to provide(when provide try also to remove the no active slaves) and another thread is trying to reconnect the death slaves

      It seems that this deadlock happens when the price of some spot instance type is mayor than we have set and we see in the aws console instance in open status for price-to-low

      "jenkins.util.Timer 6" #73 daemon prio=5 os_prio=0 tid=0x00007ffaf0216800 nid=0x46fa waiting for monitor entry [0x00007ffa74aad000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:640)

      • waiting to lock <0x0000000727baa970> (a hudson.plugins.ec2.AmazonEC2Cloud)
        at hudson.plugins.ec2.EC2AbstractSlave.getInstance(EC2AbstractSlave.java:279)
        at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:438)
        at hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:406)
        at hudson.plugins.ec2.EC2SpotSlave.terminate(EC2SpotSlave.java:73)
        at hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:346)
        at hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:123)
        at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:85)
        at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:43)
        at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
        at hudson.model.Queue._withLock(Queue.java:1334)
        at hudson.model.Queue.withLock(Queue.java:1211)
        at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
        at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

      Locked ownable synchronizers:

      • <0x00000006800423d0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      • <0x0000000682713fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      • <0x0000000725d120b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

      "jenkins.util.Timer 2" #68 daemon prio=5 os_prio=0 tid=0x00007ffaf8003000 nid=0x46f5 waiting on condition [0x00007ffb24da9000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x0000000682713fc8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at hudson.model.Queue._withLock(Queue.java:1332)
        at hudson.model.Queue.withLock(Queue.java:1211)
        at jenkins.model.Nodes.removeNode(Nodes.java:237)
        at jenkins.model.Jenkins.removeNode(Jenkins.java:2089)
        at hudson.plugins.ec2.EC2Cloud.countCurrentEC2Slaves(EC2Cloud.java:422)
        at hudson.plugins.ec2.EC2Cloud.getPossibleNewSlavesCount(EC2Cloud.java:502)
        at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:522)
      • locked <0x0000000727baa970> (a hudson.plugins.ec2.AmazonEC2Cloud)
        at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:551)
        at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:714)
        at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
        at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
        at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
        at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

      Locked ownable synchronizers:

      • <0x0000000680033dd0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      • <0x0000000683447ff8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

      I attach the complete jstack log
       

       

            francisu Francis Upton
            andrea_vavassori Andrea Vavassori
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: