Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37034

Deadlock in hudson.model.Executor

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Component/s: core
    • Labels:
      None
    • Environment:
      Jenkins 1.609
    • Similar Issues:

      Description

      We caught a deadlock in hudson.model.Executor with this stacktrace (XXX were sensitive data):

      "Executor #-1 for XXX : executing XXX #160" daemon prio=10 tid=2249607168 nid=6359
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for <0x5ba959038> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
              at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
              at hudson.model.Executor.interrupt(Executor.java:183)
              at hudson.model.Executor.interrupt(Executor.java:164)
              at hudson.model.Executor.interrupt(Executor.java:158)
              at hudson.model.Executor.interrupt(Executor.java:145)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.selfInterrupt(AbstractQueuedSynchronizer.java:802)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:937)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
              at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
              at hudson.model.Executor.getCurrentExecutable(Executor.java:475)
              at hudson.model.Executor.of(Executor.java:931)
              at hudson.model.Run.getExecutor(Run.java:517)
              at hudson.matrix.MatrixBuild$MatrixBuildExecution.doRun(MatrixBuild.java:376)
              - locked <0x402e23f78> (a hudson.model.Queue)
              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:536)
              at hudson.model.Run.execute(Run.java:1738)
              at hudson.matrix.MatrixBuild.run(MatrixBuild.java:301)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:381)
      

      Because there is obtained lock for Queue first and than WriteLock waits to releasing ReadLock indefinitely, Jenkins doesn't response to anything (locked Queue)...

      I've found a similar issue at JENKINS-28690 for Executor.abortResult(). If I understand Stephen's fix correctly it can't be applied here because we don't know from where we have locked via ReadLock. Upgrading ReadLock to WriteLock is impossible way too.

        Attachments

          Issue Links

            Activity

            Hide
            pajasoft Pavel Janoušek added a comment -

            I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open.

            I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer.

            Show
            pajasoft Pavel Janoušek added a comment - I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open. I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer .
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Pavel Janoušek please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.

            Show
            oleg_nenashev Oleg Nenashev added a comment - Pavel Janoušek please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.
            Hide
            pajasoft Pavel Janoušek added a comment -

            Oleg Nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.

            Show
            pajasoft Pavel Janoušek added a comment - Oleg Nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

            Show
            oleg_nenashev Oleg Nenashev added a comment - Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

              People

              • Assignee:
                Unassigned
                Reporter:
                pajasoft Pavel Janoušek
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: