Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27670

Deadlock when calling supervise()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • mesos-plugin
    • None
    • mesos-plugin 0.6.0 (slightly modified)

      It seems when JenkinsScheduler.statusUpdate() tries to stop the Scheduler and the Retention Timer of a Slave tries to stop a Slave it can somehow end in a deadlock.

      This is because the Timer locks the MesosImpl instance and statusUpdate() the SUPERVISOR_LOCK. Then MesosImpl tries to terminate the Slave and waits for the SUPERVISOR_LOCK to be freed by the statusUpdate() Thread. However, it seems that statusUpdate() needs a lock on MesosImpl too, when trying to stop the Scheduler.

      This is the Threaddump (I use a slightly modified version of Mesos plugin 0.6.0, so the linenumbers are probably not 100% right):

      "Thread-2516073" - Thread t@2898790
         java.lang.Thread.State: BLOCKED
          at org.jenkinsci.plugins.mesos.Mesos$MesosImpl.stopScheduler(Mesos.java:141)
          - waiting to lock <62132b60> (a org.jenkinsci.plugins.mesos.Mesos$MesosImpl) owned by "jenkins.util.Timer [#9]" t@66
          at org.jenkinsci.plugins.mesos.JenkinsScheduler.supervise(JenkinsScheduler.java:749)
          at org.jenkinsci.plugins.mesos.JenkinsScheduler.statusUpdate(JenkinsScheduler.java:634)
      
         Locked ownable synchronizers:
          - locked <3af5466a> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      
      "jenkins.util.Timer [#9]" - Thread t@66
         java.lang.Thread.State: WAITING
          at sun.misc.Unsafe.park(Native Method)
          - waiting to lock <3af5466a> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Thread-2516073" t@2898790
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
          at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
          at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
          at org.jenkinsci.plugins.mesos.JenkinsScheduler.supervise(JenkinsScheduler.java:725)
          at org.jenkinsci.plugins.mesos.JenkinsScheduler.terminateJenkinsSlave(JenkinsScheduler.java:220)
          - locked <55398768> (a org.jenkinsci.plugins.mesos.JenkinsScheduler)
          at org.jenkinsci.plugins.mesos.Mesos$MesosImpl.stopJenkinsSlave(Mesos.java:157)
          - locked <62132b60> (a org.jenkinsci.plugins.mesos.Mesos$MesosImpl)
          at org.jenkinsci.plugins.mesos.MesosComputerLauncher.terminate(MesosComputerLauncher.java:122)
          at org.jenkinsci.plugins.mesos.MesosSlave.terminate(MesosSlave.java:91)
          at org.jenkinsci.plugins.mesos.MesosRetentionStrategy.check(MesosRetentionStrategy.java:70)
          - locked <75b63404> (a org.jenkinsci.plugins.mesos.MesosRetentionStrategy)
          at org.jenkinsci.plugins.mesos.MesosRetentionStrategy.check(MesosRetentionStrategy.java:26)
          at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
          at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:722)
      
         Locked ownable synchronizers:
          - locked <703c7665> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

      I tried to solve the problem myself, but I somehow got a knot in my brain from all the synchronized calls etc. The only thing I can guess is that the multiple synchronized cross calls between MesosImpl and JenkinsScheduler are not great.

      Maybe some Java whiz can solve the problem there.

      PS: I posted this also on the github issues page, because it seems to be more active (https://github.com/jenkinsci/mesos-plugin/issues/97).

            vinodkone Vinod Kone
            seder Stefan Prietl
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: