Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core, maven-plugin
    • Labels:
      None
    • Environment:
      core 1.564-SNAPSHOT, remoting 2.41
    • Similar Issues:

      Description

      On a number of the slaves at builds.apache.org, we're seeing slaves hanging after a while, both Linux and Windows slaves. The common thread seems to be Maven jobs being run on them and eventually hanging, causing everything else on the slave to hang (including, in some cases, attempts to get the threaddump from within Jenkins). The original Maven build hangs indefinitely, and any subsequent builds trying to run on the same slave get to the point of starting the git clone/svn checkout/etc and then just hang. The Linux slaves are running Java 1.8.0_05, and the Windows are running some Java 7 version - not sure which.

      Threaddump for Linux is at https://gist.github.com/abayer/3d567b56776e1ce78ad7 (one job hanging for over a day, another that started an hour or so ago but is now hanging), threaddump for Windows is at https://gist.github.com/abayer/c99f72ca1232e4d8acfa (only one job running at all on there, hanging for 17 hours or so).

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Looks like the fix of JENKINS-22354, in 2.2, may have introduced this bug.

            Show
            jglick Jesse Glick added a comment - Looks like the fix of JENKINS-22354 , in 2.2, may have introduced this bug.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            thread dump from abayer shows that something weird is happening with SplittableBuildListener.

            Below is my analysis of the issue from one of our customers (ZD-19531), which turns out to be the same problem:

            3 threads appear to be blocked on SplittableBuildListener.synchronizeOnMark of the same object, which is odd, as the execution of this is supposed to be sequential.

            • Computer.threadPoolForRemoting [#1099] is waiting to enter SplittableBuildListener.synchronizeOnMark.
            • Computer.threadPoolForRemoting [#1108] is inside synchronizeOnMark and on markCountLock.wait.
            • Computer.threadPoolForRemoting [#1113] has found the mark and trying to report that, but blocked to get in
            • Computer.threadPoolForRemoting [#1104] is inside synchronizeOnMark waiting for Future.get()

            I think there's incorrect use of synchronization here. When wait() happens, the lock is released, which allows another thread to enter synchronizedOnMark. We need to use another lock to ensure synchronizeOnMark is not concurrently invoked.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - thread dump from abayer shows that something weird is happening with SplittableBuildListener . Below is my analysis of the issue from one of our customers (ZD-19531), which turns out to be the same problem: — 3 threads appear to be blocked on SplittableBuildListener.synchronizeOnMark of the same object, which is odd, as the execution of this is supposed to be sequential. Computer.threadPoolForRemoting [#1099] is waiting to enter SplittableBuildListener.synchronizeOnMark. Computer.threadPoolForRemoting [#1108] is inside synchronizeOnMark and on markCountLock.wait. Computer.threadPoolForRemoting [#1113] has found the mark and trying to report that, but blocked to get in Computer.threadPoolForRemoting [#1104] is inside synchronizeOnMark waiting for Future.get() I think there's incorrect use of synchronization here. When wait() happens, the lock is released, which allows another thread to enter synchronizedOnMark. We need to use another lock to ensure synchronizeOnMark is not concurrently invoked.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            src/main/java/hudson/maven/SplittableBuildListener.java
            http://jenkins-ci.org/commit/maven-plugin/b145d5925ddeae2d697743920da204e6991375ac
            Log:
            [FIXED JENKINS-23098]

            Reference: ZD-19531

            Looking at [4], one notices that three threads are in an effective dead lock state around synchronizeOnMark. I extracted relevant part into [5].

            Thread #1661 is trying to report a discovered mark, but blocking [1]. Thread #1665 is inside synchronizeOnMark, on markCountLock.wait() [2]. Thread #1667 is stuck on Future.get() and hasn't returned [3], which holds the lock that blocks [1] from unblocking [2].

            The root problem is that synchronizeOnMark method is never meant to be concurrently executed. But given the way the lock is used, if one thread gets to wait(), it's possible that another thread would come along and go into this function.

            In this change, I'm preventing that by introducing another lock to serialize the execution of the entire synchronizeOnMark() call. I'm not using the "this" object for locking because it's already used for another purpose (see the lock() method)

            I'm not yet clear on why the synchronizeOnMark() method is called concurrently to begin with. The interaction with the -T option of Maven is suspected.

            [1] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L2
            [2] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L34
            [3] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L71
            [4] https://gist.github.com/abayer/7ff4de807c6373eec40d
            [5] https://gist.github.com/kohsuke/374c22e737a77c9b0421

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/maven/SplittableBuildListener.java http://jenkins-ci.org/commit/maven-plugin/b145d5925ddeae2d697743920da204e6991375ac Log: [FIXED JENKINS-23098] Reference: ZD-19531 Looking at [4] , one notices that three threads are in an effective dead lock state around synchronizeOnMark. I extracted relevant part into [5] . Thread #1661 is trying to report a discovered mark, but blocking [1] . Thread #1665 is inside synchronizeOnMark, on markCountLock.wait() [2] . Thread #1667 is stuck on Future.get() and hasn't returned [3] , which holds the lock that blocks [1] from unblocking [2] . The root problem is that synchronizeOnMark method is never meant to be concurrently executed. But given the way the lock is used, if one thread gets to wait(), it's possible that another thread would come along and go into this function. In this change, I'm preventing that by introducing another lock to serialize the execution of the entire synchronizeOnMark() call. I'm not using the "this" object for locking because it's already used for another purpose (see the lock() method) I'm not yet clear on why the synchronizeOnMark() method is called concurrently to begin with. The interaction with the -T option of Maven is suspected. [1] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L2 [2] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L34 [3] https://gist.github.com/kohsuke/374c22e737a77c9b0421#file-gistfile1-txt-L71 [4] https://gist.github.com/abayer/7ff4de807c6373eec40d [5] https://gist.github.com/kohsuke/374c22e737a77c9b0421
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            If you see this problem, can you please try out this build and report back if that fixes the problem?

            Show
            kohsuke Kohsuke Kawaguchi added a comment - If you see this problem, can you please try out this build and report back if that fixes the problem?
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Released Maven plugin 2.5 with this fix.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Released Maven plugin 2.5 with this fix.

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                abayer Andrew Bayer
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: