Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27636

thread starvation in hudson.diagnosis.OldDataMonitor

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • core
    • None
    • Debian Wheezy
      OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
      Jenkins 1.606 running behind nginx

      /systemInfo page attached

      Since upgrading to Jenkins 1.605 and 1.606 from Jenkins 1.596 we are seeing a thread starvation issue. Basically, Jenkins will run for a while (between 15 minutes and several hours), after which it will stop responding to HTTP requests for most pages (i.e. /, /github-webhook, or the build page for a particular job).

      Other pages DO still load, such as /threadDump which meant we could at least get stack traces of what's going on.

      In this state, in progress Jenkins jobs never seem to complete, new ones cannot be queued, external notifications stop working, and so on. Stopping the service does not work - it requires being forcefully killed. Upon restarting Jenkins, it will again continue working for a while and then the same thing happens.

      Reverting to Jenkins 1.596 does not exhibit this problem.

      Here's a snippet of the stack dump (full one attached) that I believe points at the problem

      "Executor #0 for Slave (i-17fd89de) : executing <JOBNAME> #1" Id=46 Group=main BLOCKED on com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject@2983312f owned by 
      
      "jenkins.util.Timer [#10]" Id=33
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject.getSubProjects(FreeStyleMultiBranchProject.java:291)
      	-  blocked on com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject@2983312f
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject.getItem(FreeStyleMultiBranchProject.java:335)
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject.getItem(FreeStyleMultiBranchProject.java:122)
      	at jenkins.model.Jenkins.getItemByFullName(Jenkins.java:2446)
      	at jenkins.model.Jenkins.getItemByFullName(Jenkins.java:2465)
      	at hudson.diagnosis.OldDataMonitor.referTo(OldDataMonitor.java:359)
      	at hudson.diagnosis.OldDataMonitor.remove(OldDataMonitor.java:107)
      	-  locked hudson.diagnosis.OldDataMonitor@62ea7c46
      	at hudson.diagnosis.OldDataMonitor.access$000(OldDataMonitor.java:67)
      	at hudson.diagnosis.OldDataMonitor$1.onChange(OldDataMonitor.java:120)
      	at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:80)
      	at hudson.model.Run.save(Run.java:1912)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchBuild@3ccdf95f
      	at hudson.model.Run.execute(Run.java:1808)
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchBuild.run(FreeStyleBranchBuild.java:89)
      	at hudson.model.ResourceController.execute(ResourceController.java:89)
      	at hudson.model.Executor.run(Executor.java:240)
      

      and

      "jenkins.util.Timer [#10]" Id=33 Group=main BLOCKED on hudson.diagnosis.OldDataMonitor@62ea7c46 owned by "Executor #0 for Slave (i-17fd89de) : executing <JOBNAME>utbms-j-codes #1" Id=46
      	at hudson.diagnosis.OldDataMonitor.remove(OldDataMonitor.java:107)
      	-  blocked on hudson.diagnosis.OldDataMonitor@62ea7c46
      	at hudson.diagnosis.OldDataMonitor.access$000(OldDataMonitor.java:67)
      	at hudson.diagnosis.OldDataMonitor$1.onChange(OldDataMonitor.java:120)
      	at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:80)
      	at hudson.model.AbstractItem.save(AbstractItem.java:514)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchProject@3f9a1a05
      	at hudson.model.Job.save(Job.java:177)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchProject@3f9a1a05
      	at hudson.model.AbstractProject.save(AbstractProject.java:303)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchProject@3f9a1a05
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleBranchProject.setIsTemplate(FreeStyleBranchProject.java:109)
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject._syncBranches(FreeStyleMultiBranchProject.java:1105)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject@2983312f
      	at com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject.syncBranches(FreeStyleMultiBranchProject.java:1018)
      	-  locked com.github.mjdetullio.jenkins.plugins.multibranch.FreeStyleMultiBranchProject@2983312f
      	at com.github.mjdetullio.jenkins.plugins.multibranch.SyncBranchesTrigger.run(SyncBranchesTrigger.java:98)
      	at hudson.triggers.Trigger.checkTriggers(Trigger.java:265)
      	at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
      	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      Based on that, I am assuming the problem is with:
      blocked on hudson.diagnosis.OldDataMonitor

      Though I suppose it could also be a problem with the Jenkins multi-branch project plugin.

            Unassigned Unassigned
            npaufler Nicholas Paufler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: