Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28690

Deadlock in hudson.model.Executor

    Details

    • Similar Issues:

      Description

      In very specific scenario, when build is running on slave, and PingThread detects slave as unavailable deadlock occurs in Executor thread of that slave.

      stacktrace:

      "Executor #0 for xxxx : executing xxxx #9" daemon prio=10 tid=0x00007f444248b800 nid=0x66e0 waiting on condition [0x00007f448a92f000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000045e3eea00> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
      	at hudson.model.Executor.interrupt(Executor.java:183)
      	at hudson.model.Executor.interrupt(Executor.java:164)
      	at hudson.model.Executor.interrupt(Executor.java:158)
      	at hudson.model.Executor.interrupt(Executor.java:145)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.selfInterrupt(AbstractQueuedSynchronizer.java:825)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:959)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
      	at hudson.model.Executor.abortResult(Executor.java:208)
      	at hudson.model.Build$BuildExecution.doRun(Build.java:165)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
      	at hudson.model.Run.execute(Run.java:1744)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:98)
      	at hudson.model.Executor.run(Executor.java:374)
      

      This alone is not very bad, but than maintain task of queue kicks in, blocks on Executor's lock and leads to deadlock on Queue lock.
      Stacktrace:

      "AtmostOneTaskExecutor[hudson.model.Queue$1@6a9812a3] [#6684]" daemon prio=10 tid=0x00007f44bf7af000 nid=0x74ec waiting on condition [0x00007f44c827b000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000045e3eea00> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
      	at hudson.model.Executor.isParking(Executor.java:609)
      	at hudson.model.Queue.maintain(Queue.java:1282)
      	at hudson.model.Queue$1.call(Queue.java:334)
      	at hudson.model.Queue$1.call(Queue.java:331)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:101)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:91)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
      	at java.lang.Thread.run(Thread.java:745)
      

      This blocks all actions on Jenkins, as no new builds can be scheduled and you cannot access Jenkins main page.

      After downgrade to versions 1.606 before https://issues.jenkins-ci.org/browse/JENKINS-27565 all is working good.

        Attachments

          Issue Links

            Activity

            Hide
            danielbeck Daniel Beck added a comment -

            Stephen Connolly Could you please take a look at this?

            Show
            danielbeck Daniel Beck added a comment - Stephen Connolly Could you please take a look at this?
            Hide
            stephenconnolly Stephen Connolly added a comment -

            I believe https://github.com/jenkinsci/jenkins/pull/1730 should resolve this issue

            Show
            stephenconnolly Stephen Connolly added a comment - I believe https://github.com/jenkinsci/jenkins/pull/1730 should resolve this issue
            Hide
            olivergondza Oliver Gondža added a comment - - edited

            I have just hit the same thing running tests for matrix plugin. Log attached. EDIT: It turns out it is a different issue: JENKINS-28840

            Show
            olivergondza Oliver Gondža added a comment - - edited I have just hit the same thing running tests for matrix plugin. Log attached. EDIT: It turns out it is a different issue: JENKINS-28840
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/ddb0a472ad44fcbce31fb565d477901575e3c581
            Log:
            [FIXED JENKINS-28690] Deadlock in hudson.model.Executor

            • Rather fun one here. The Lock code relies on assuming that Thread.interrupted() is clear on entry
            • If it then sees Thread.interrupted() set, it will interrupt the current thread in order to set the
              flag again.
            • Executor is a thread that does funky things with an overridden interrupt method
            • Executor.abortResult() is used to track a build be interrupted or aborted in some other way
            • As a result the abortResult can cause a deadlockif there is a genuine interruption
            • This fix clears the interrupt flag in abortResult() and uses the write lock in order to ensure:
            • The same lock as used in interrupt() is helf
            • The interrupt flag is clear
            • Clearing the interrupt flag should be safe as the only time it is called is immediately after
              an interruption and the resulting exception is caught and rethrown/logged anyway
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/ddb0a472ad44fcbce31fb565d477901575e3c581 Log: [FIXED JENKINS-28690] Deadlock in hudson.model.Executor Rather fun one here. The Lock code relies on assuming that Thread.interrupted() is clear on entry If it then sees Thread.interrupted() set, it will interrupt the current thread in order to set the flag again. Executor is a thread that does funky things with an overridden interrupt method Executor.abortResult() is used to track a build be interrupted or aborted in some other way As a result the abortResult can cause a deadlockif there is a genuine interruption This fix clears the interrupt flag in abortResult() and uses the write lock in order to ensure: The same lock as used in interrupt() is helf The interrupt flag is clear Clearing the interrupt flag should be safe as the only time it is called is immediately after an interruption and the resulting exception is caught and rethrown/logged anyway
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/4540ba71b5fac5eede500878993eb3e5b165979d
            Log:
            Merge pull request #1730 from stephenc/jenkins-28690

            [FIXED JENKINS-28690] Deadlock in hudson.model.Executor

            Compare: https://github.com/jenkinsci/jenkins/compare/f628e3992842...4540ba71b5fa

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/4540ba71b5fac5eede500878993eb3e5b165979d Log: Merge pull request #1730 from stephenc/jenkins-28690 [FIXED JENKINS-28690] Deadlock in hudson.model.Executor Compare: https://github.com/jenkinsci/jenkins/compare/f628e3992842...4540ba71b5fa
            Hide
            stephenconnolly Stephen Connolly added a comment -

            Oliver Gondža I can confirm that this appears to be a different deadlock. Could you capture it in a separate issue (and assign to me)?

            Show
            stephenconnolly Stephen Connolly added a comment - Oliver Gondža I can confirm that this appears to be a different deadlock. Could you capture it in a separate issue (and assign to me)?
            Hide
            olivergondza Oliver Gondža added a comment -
            Show
            olivergondza Oliver Gondža added a comment - Stephen Connolly , JENKINS-28840 created.
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4172

            Result = SUCCESS

            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4172 Result = SUCCESS
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/c24c3236917cfac2ae7c536b5fd6ad737fa2253c
            Log:
            [FIXED JENKINS-28690] Deadlock in hudson.model.Executor

            • Rather fun one here. The Lock code relies on assuming that Thread.interrupted() is clear on entry
            • If it then sees Thread.interrupted() set, it will interrupt the current thread in order to set the
              flag again.
            • Executor is a thread that does funky things with an overridden interrupt method
            • Executor.abortResult() is used to track a build be interrupted or aborted in some other way
            • As a result the abortResult can cause a deadlockif there is a genuine interruption
            • This fix clears the interrupt flag in abortResult() and uses the write lock in order to ensure:
            • The same lock as used in interrupt() is helf
            • The interrupt flag is clear
            • Clearing the interrupt flag should be safe as the only time it is called is immediately after
              an interruption and the resulting exception is caught and rethrown/logged anyway

            (cherry picked from commit ddb0a472ad44fcbce31fb565d477901575e3c581)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/c24c3236917cfac2ae7c536b5fd6ad737fa2253c Log: [FIXED JENKINS-28690] Deadlock in hudson.model.Executor Rather fun one here. The Lock code relies on assuming that Thread.interrupted() is clear on entry If it then sees Thread.interrupted() set, it will interrupt the current thread in order to set the flag again. Executor is a thread that does funky things with an overridden interrupt method Executor.abortResult() is used to track a build be interrupted or aborted in some other way As a result the abortResult can cause a deadlockif there is a genuine interruption This fix clears the interrupt flag in abortResult() and uses the write lock in order to ensure: The same lock as used in interrupt() is helf The interrupt flag is clear Clearing the interrupt flag should be safe as the only time it is called is immediately after an interruption and the resulting exception is caught and rethrown/logged anyway (cherry picked from commit ddb0a472ad44fcbce31fb565d477901575e3c581)
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java
            http://jenkins-ci.org/commit/mansion-cloud-plugin/3346565a0faed45718424d5481cb4d8bf1d58401
            Log:
            Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java http://jenkins-ci.org/commit/mansion-cloud-plugin/3346565a0faed45718424d5481cb4d8bf1d58401 Log: Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Ivan Meredith
            Path:
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java
            http://jenkins-ci.org/commit/mansion-cloud-plugin/cf798b87dc339c91da4b5fb26ceb4ab1bcae4259
            Log:
            Merge pull request #2 from jenkinsci/jenkins-28690-related

            Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks

            Compare: https://github.com/jenkinsci/mansion-cloud-plugin/compare/e265cac825ba...cf798b87dc33

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Ivan Meredith Path: src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java http://jenkins-ci.org/commit/mansion-cloud-plugin/cf798b87dc339c91da4b5fb26ceb4ab1bcae4259 Log: Merge pull request #2 from jenkinsci/jenkins-28690-related Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks Compare: https://github.com/jenkinsci/mansion-cloud-plugin/compare/e265cac825ba...cf798b87dc33
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/0ba505b60ca86d6b103b070a690a98ae6fef8c5d
            Log:
            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks

            • Without this, then it becomes a question of find catch and release for each potential code path that
              might end up restoring the interrupt flag on the current thread.
            • Since standard Lock support is kind enough to restore the interrupt flag on the current thread
              when blocked waiting for the lock, that would be a hiding to nothing
            • I welcome others to review my logic detailed in the code comment
            • I am leaving the code comment as this is IMHO too important to assume that somebody will
              check the git commit history
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/0ba505b60ca86d6b103b070a690a98ae6fef8c5d Log: JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks Without this, then it becomes a question of find catch and release for each potential code path that might end up restoring the interrupt flag on the current thread. Since standard Lock support is kind enough to restore the interrupt flag on the current thread when blocked waiting for the lock, that would be a hiding to nothing I welcome others to review my logic detailed in the code comment I am leaving the code comment as this is IMHO too important to assume that somebody will check the git commit history
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/a80972307e03a7f67e97dee700720cb80f7f65d8
            Log:
            Merge pull request #1786 from stephenc/jenkins-28690-correct-interrupt-override

            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks

            Compare: https://github.com/jenkinsci/jenkins/compare/7ab816878b16...a80972307e03

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/a80972307e03a7f67e97dee700720cb80f7f65d8 Log: Merge pull request #1786 from stephenc/jenkins-28690-correct-interrupt-override JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks Compare: https://github.com/jenkinsci/jenkins/compare/7ab816878b16...a80972307e03
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4246
            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks (Revision 0ba505b60ca86d6b103b070a690a98ae6fef8c5d)

            Result = SUCCESS
            stephen connolly : 0ba505b60ca86d6b103b070a690a98ae6fef8c5d
            Files :

            • core/src/main/java/hudson/model/Executor.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4246 JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks (Revision 0ba505b60ca86d6b103b070a690a98ae6fef8c5d) Result = SUCCESS stephen connolly : 0ba505b60ca86d6b103b070a690a98ae6fef8c5d Files : core/src/main/java/hudson/model/Executor.java
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4292
            [FIXED JENKINS-28690] Deadlock in hudson.model.Executor (Revision c24c3236917cfac2ae7c536b5fd6ad737fa2253c)

            Result = UNSTABLE
            ogondza : c24c3236917cfac2ae7c536b5fd6ad737fa2253c
            Files :

            • core/src/main/java/hudson/model/Executor.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXED JENKINS-28690] Deadlock in hudson.model.Executor (Revision c24c3236917cfac2ae7c536b5fd6ad737fa2253c) Result = UNSTABLE ogondza : c24c3236917cfac2ae7c536b5fd6ad737fa2253c Files : core/src/main/java/hudson/model/Executor.java

              People

              • Assignee:
                stephenconnolly Stephen Connolly
                Reporter:
                szubster Tomasz Szuba
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: