Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41832

master uses 100% cpu in jenkinsci.plugins.pubsub.listeners.SyncQueueListener()

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: blueocean-plugin
    • Labels:
      None
    • Environment:
      jenkins 52.0
      blueocean 1.0.0-b22
      pubsub-light 1.6

      OS: ubuntu 16.04 xenial
      JAVA: openjdk 1.8.0
    • Epic Link:
    • Sprint:
      iapetus
    • Similar Issues:

      Description

      During a build with a heavy log the master pubsub publisher thread has one core pegged at 100% in:
      org.jenkinsci.plugins.pubsub.listeners.SyncQueueListener$1.run(SyncQueueListener.java:103)

      Note that no browser is connected to jenkins over HTTP at the time.

      This warrants investigation cc Tom FENNELLY

        Attachments

          Issue Links

            Activity

            szym Szymon J created issue -
            michaelneale Michael Neale made changes -
            Field Original Value New Value
            Assignee Tom FENNELLY [ tfennelly ]
            michaelneale Michael Neale made changes -
            Assignee Tom FENNELLY [ tfennelly ] Vivek Pandey [ vivek ]
            michaelneale Michael Neale made changes -
            Epic Link JENKINS-35737 [ 171801 ]
            michaelneale Michael Neale made changes -
            Sprint panthalassa [ 221 ]
            michaelneale Michael Neale made changes -
            Description During a build with a heavy log the master pubsub publisher thread has one core pegged at 100% in:
            org.jenkinsci.plugins.pubsub.listeners.SyncQueueListener$1.run(SyncQueueListener.java:103)

            Note that no browser is connected to jenkins over HTTP at the time.
            During a build with a heavy log the master pubsub publisher thread has one core pegged at 100% in:
            org.jenkinsci.plugins.pubsub.listeners.SyncQueueListener$1.run(SyncQueueListener.java:103)

            Note that no browser is connected to jenkins over HTTP at the time.

            This warrants investigation
            Hide
            jamesdumay James Dumay added a comment -

            Szymon J thanks for filing this ticket. Would we be able to get a full thread dump of the Jenkins server under the condition that you have listed?

            Show
            jamesdumay James Dumay added a comment - Szymon J thanks for filing this ticket. Would we be able to get a full thread dump of the Jenkins server under the condition that you have listed?
            Hide
            michaelneale Michael Neale added a comment -

            Szymon J can you provide more detail on the job run? How many steps - one step, lots of parallel?

            Show
            michaelneale Michael Neale added a comment - Szymon J can you provide more detail on the job run? How many steps - one step, lots of parallel?
            Hide
            szym Szymon J added a comment - - edited

            This is using pipeline script, one node, three stages. The step in progress is sh("bazel build ...")

            This happens in every build.

            Show
            szym Szymon J added a comment - - edited This is using pipeline script, one node, three stages. The step in progress is sh("bazel build ...") This happens in every build.
            Hide
            szym Szymon J added a comment -

            I should explain that the log is heavy in size (10k lines), but the output rate is not that big (few lines per sec).

            Show
            szym Szymon J added a comment - I should explain that the log is heavy in size (10k lines), but the output rate is not that big (few lines per sec).
            michaelneale Michael Neale made changes -
            Description During a build with a heavy log the master pubsub publisher thread has one core pegged at 100% in:
            org.jenkinsci.plugins.pubsub.listeners.SyncQueueListener$1.run(SyncQueueListener.java:103)

            Note that no browser is connected to jenkins over HTTP at the time.

            This warrants investigation
            During a build with a heavy log the master pubsub publisher thread has one core pegged at 100% in:
            org.jenkinsci.plugins.pubsub.listeners.SyncQueueListener$1.run(SyncQueueListener.java:103)

            Note that no browser is connected to jenkins over HTTP at the time.

            This warrants investigation cc [~tfennelly]
            Hide
            tfennelly Tom FENNELLY added a comment - - edited

            Ah bumber ... this is the thread we added for checking the MBP indexing. Must be getting into a tight retry loop.

            Show
            tfennelly Tom FENNELLY added a comment - - edited Ah bumber ... this is the thread we added for checking the MBP indexing. Must be getting into a tight retry loop.
            Hide
            jamesdumay James Dumay added a comment -

            Tom FENNELLY sorry to but in but could you use a producer/consumer queue and avoid the creation that single thread for processing? You could use a pull on that queue to pump the queues and avoids having to keep that thread looping.

            Show
            jamesdumay James Dumay added a comment - Tom FENNELLY sorry to but in but could you use a producer/consumer queue and avoid the creation that single thread for processing? You could use a pull on that queue to pump the queues and avoids having to keep that thread looping.
            Hide
            tfennelly Tom FENNELLY added a comment -

            James Dumay And how would you know when the queue tasks are complete? There's only one event we can use in all of this, which is the event that feeds the tasks into queue. After that we have nothing to tell us when those tasks are done/complete. Hence the need to periodically check their state. In effect, that queue is really just a list of tasks that we are tracking the state of i.e. not really a "queue" in the producer/consumer sense that I think you are thinking of. Obviously it needs to be reworked though.

            Show
            tfennelly Tom FENNELLY added a comment - James Dumay And how would you know when the queue tasks are complete? There's only one event we can use in all of this, which is the event that feeds the tasks into queue. After that we have nothing to tell us when those tasks are done/complete. Hence the need to periodically check their state. In effect, that queue is really just a list of tasks that we are tracking the state of i.e. not really a "queue" in the producer/consumer sense that I think you are thinking of. Obviously it needs to be reworked though.
            Hide
            szym Szymon J added a comment - - edited

            If I understand the intention of SyncQueueListener, it just has a small bug.

            leftItem = queueTaskLeftPublishQueue.poll(...)
            if (leftItem!= null) {
              // FIXME: This is a bug. Don't drain tryLater unless the future is done.
              tryLaterQueueTaskLeftQueue.drainTo(queueTaskLeftPublishQueue);
              if (leftItem.getFuture().isDone()) {
                // TODO: FIXME: This is a good place to drain the tryLater queue
                publish(...);
              } else {
                tryLaterQueueTaskLeftQueue.put(leftItem);
              }
            } else {
              tryLaterQueueTaskLeftQueue.drainTo(queueTaskLeftPublishQueue);
            }
            
            Show
            szym Szymon J added a comment - - edited If I understand the intention of SyncQueueListener, it just has a small bug. leftItem = queueTaskLeftPublishQueue.poll(...) if (leftItem!= null ) { // FIXME: This is a bug. Don't drain tryLater unless the future is done. tryLaterQueueTaskLeftQueue.drainTo(queueTaskLeftPublishQueue); if (leftItem.getFuture().isDone()) { // TODO: FIXME: This is a good place to drain the tryLater queue publish(...); } else { tryLaterQueueTaskLeftQueue.put(leftItem); } } else { tryLaterQueueTaskLeftQueue.drainTo(queueTaskLeftPublishQueue); }
            Hide
            szym Szymon J added a comment - - edited

            The bug happens when there is more than one item in the queue and none of the futures is done.

            1. // queue has 2 items, tryQueue has 0 items
            2. leftItem = queue.poll() // queue has 1, tryQueue has 0
            3. tryQueue.drainTo(queue) // queue has 1, tryQueue has 0
            4. !isDone => tryQueue.put(leftItem) // queue has 1, tryQueue has 1
            5. leftItem = queue.poll() // queue has 0, tryQueue has 1
            6. tryQueue.drainTo(queue) // queue has 1, tryQueue has 0
            7. !isDone =>tryQueue.put(leftItem) // queue has 1, tryQueue has 1
            ... we're back in 4. spinning.

            Show
            szym Szymon J added a comment - - edited The bug happens when there is more than one item in the queue and none of the futures is done. 1. // queue has 2 items, tryQueue has 0 items 2. leftItem = queue.poll() // queue has 1, tryQueue has 0 3. tryQueue.drainTo(queue) // queue has 1, tryQueue has 0 4. !isDone => tryQueue.put(leftItem) // queue has 1, tryQueue has 1 5. leftItem = queue.poll() // queue has 0, tryQueue has 1 6. tryQueue.drainTo(queue) // queue has 1, tryQueue has 0 7. !isDone =>tryQueue.put(leftItem) // queue has 1, tryQueue has 1 ... we're back in 4. spinning.
            Show
            szym Szymon J added a comment - Created https://github.com/jenkinsci/pubsub-light-module/pull/6
            Hide
            tfennelly Tom FENNELLY added a comment -

            Szymon J awesome thanks a lot.

            Show
            tfennelly Tom FENNELLY added a comment - Szymon J awesome thanks a lot.
            Hide
            tfennelly Tom FENNELLY added a comment -

            I'm also wondering would it make things a bit easier/clearer if we just used the blocking queue (with no timeouts) to drive the adding/checking/removing of a separate "tracking list". The blocking queue could get populated with "command" objects. One would be an "AddTask" command that gets put in for new tasks. Another would be "CheckTasks" command that gets put in by a timer of some sort. Just thinking out loud.

            Show
            tfennelly Tom FENNELLY added a comment - I'm also wondering would it make things a bit easier/clearer if we just used the blocking queue (with no timeouts) to drive the adding/checking/removing of a separate "tracking list". The blocking queue could get populated with "command" objects. One would be an "AddTask" command that gets put in for new tasks. Another would be "CheckTasks" command that gets put in by a timer of some sort. Just thinking out loud.
            Hide
            szym Szymon J added a comment -

            It seems that the problem you are addressing with SyncQueueListener is that of the POSIX `select` or `poll`, which is to monitor a number of items for their readiness.

            So a blocking queue would not help here. As far as I can tell the only way to wait on a number of java.concurrent.Future is to launch a number of threads each waiting on one Future and updating a synchronized object. http://stackoverflow.com/questions/117690/wait-until-any-of-futuret-is-done

            Show
            szym Szymon J added a comment - It seems that the problem you are addressing with SyncQueueListener is that of the POSIX `select` or `poll`, which is to monitor a number of items for their readiness. So a blocking queue would not help here. As far as I can tell the only way to wait on a number of java.concurrent.Future is to launch a number of threads each waiting on one Future and updating a synchronized object. http://stackoverflow.com/questions/117690/wait-until-any-of-futuret-is-done
            Hide
            tfennelly Tom FENNELLY added a comment -

            Well ... the changes you made in your PR should fix this issue I think. Launching a thread for each just isn't an option.

            The BlockingQueue doesn't help in the classic sense of how it might normally be used (wait for something put into it .. remove and process it ... wait for the next), but it could be used as a mechanism to drive the processing thread. It is messy though, I agree. Hopefully your PR sorts it anyway

            Show
            tfennelly Tom FENNELLY added a comment - Well ... the changes you made in your PR should fix this issue I think. Launching a thread for each just isn't an option. The BlockingQueue doesn't help in the classic sense of how it might normally be used (wait for something put into it .. remove and process it ... wait for the next), but it could be used as a mechanism to drive the processing thread. It is messy though, I agree. Hopefully your PR sorts it anyway
            jamesdumay James Dumay made changes -
            Priority Minor [ 4 ] Blocker [ 1 ]
            Sprint panthalassa [ 221 ] iapetus [ 216 ]
            Assignee Vivek Pandey [ vivek ] Tom FENNELLY [ tfennelly ]
            jamesdumay James Dumay made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Show
            jamesdumay James Dumay added a comment - PR https://github.com/jenkinsci/pubsub-light-module/pull/6
            jamesdumay James Dumay made changes -
            Rank Ranked higher
            Hide
            jamesdumay James Dumay added a comment -

            This ticket should not be closed until master depends on the new release.

            Show
            jamesdumay James Dumay added a comment - This ticket should not be closed until master depends on the new release.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Szymon Jakubczak
            Path:
            src/main/java/org/jenkinsci/plugins/pubsub/listeners/SyncQueueListener.java
            http://jenkins-ci.org/commit/pubsub-light-module/71fa32f717d2dc1feee65478043baa507a1a4dfc
            Log:
            JENKINS-41832 Fix spinning in SyncQueueListener (#6)

            Fixes spinning at 100% when none of the tasks are done.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Szymon Jakubczak Path: src/main/java/org/jenkinsci/plugins/pubsub/listeners/SyncQueueListener.java http://jenkins-ci.org/commit/pubsub-light-module/71fa32f717d2dc1feee65478043baa507a1a4dfc Log: JENKINS-41832 Fix spinning in SyncQueueListener (#6) Fixes spinning at 100% when none of the tasks are done.
            Hide
            tfennelly Tom FENNELLY added a comment -

            pubsub released and PR on blueocean: https://github.com/jenkinsci/blueocean-plugin/pull/799

            People will be able to update this plugin without updating everything else.

            Thanks for the fix Szymon J.

            Show
            tfennelly Tom FENNELLY added a comment - pubsub released and PR on blueocean: https://github.com/jenkinsci/blueocean-plugin/pull/799 People will be able to update this plugin without updating everything else. Thanks for the fix Szymon J .
            Hide
            michaelneale Michael Neale added a comment -

            Thanks for PR and release quickly all!

            Show
            michaelneale Michael Neale added a comment - Thanks for PR and release quickly all!
            Hide
            tfennelly Tom FENNELLY added a comment -

            merged upstream

            Show
            tfennelly Tom FENNELLY added a comment - merged upstream
            tfennelly Tom FENNELLY made changes -
            Status In Progress [ 3 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Hide
            michaelneale Michael Neale added a comment -

            Update center is currently a bit broken, so this isn't avialable to everyone yet but will be.

            Show
            michaelneale Michael Neale added a comment - Update center is currently a bit broken, so this isn't avialable to everyone yet but will be.
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal UX-617 (Web Link)" [ 18194 ]
            jbriden Jenn Briden made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

              People

              • Assignee:
                tfennelly Tom FENNELLY
                Reporter:
                szym Szymon J
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: