Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24915

Extremely slow startup with template-project in combination with jobConfigHistory

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Startup takes 30 min for a Jenkins installation with ~1000 jobs where most of them is using template builders or publishers of template-project plugin when you have jobConfigHistory plugins.

      All jobs are loaded by Jenkins at start up. OK so far. But the template-project loops on all jobs again, and updates (twice!) every job that has template builders or publishers. Combined with a slow and questionable sleep implementation in jobConfigHistory this is disastrous for start up time.

      There may be a lot of other plugins listening to job config save events, so there could be other combinations that is poisonous.

      Note that load, disk, and CPU usage is low, we're mostly just waiting.

      I think the main problem is in template-project, but the jobConfigHistory should probably change algorithm as well. 500 ms is a very long time in computer world. 30 ms is probably more reasonable number.


      "Jenkins initialization thread" prio=10 tid=0x00007f6708883000 nid=0x5fa9 sleeping[0x00007f66fdcc4000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at hudson.plugins.jobConfigHistory.FileHistoryDao.createNewHistoryDir(FileHistoryDao.java:217)
        at hudson.plugins.jobConfigHistory.FileHistoryDao.getRootDir(FileHistoryDao.java:118)
        at hudson.plugins.jobConfigHistory.FileHistoryDao.createNewHistoryEntry(FileHistoryDao.java:406)
        at hudson.plugins.jobConfigHistory.FileHistoryDao.createNewHistoryEntryAndCopyConfig(FileHistoryDao.java:248)
        at hudson.plugins.jobConfigHistory.FileHistoryDao.saveItem(FileHistoryDao.java:264)
        at hudson.plugins.jobConfigHistory.JobConfigHistorySaveableListener.onChange(JobConfigHistorySaveableListener.java:29)
        at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:81)
        at hudson.model.AbstractItem.save(AbstractItem.java:462)
        - locked <0x00000007a2cee670> (a hudson.model.FreeStyleProject)
        at hudson.model.Job.save(Job.java:166)
        - locked <0x00000007a2cee670> (a hudson.model.FreeStyleProject)
        at hudson.model.AbstractProject.save(AbstractProject.java:303)
        - locked <0x00000007a2cee670> (a hudson.model.FreeStyleProject)
        at hudson.model.Job.removeProperty(Job.java:498)
        at hudson.model.Job.removeProperty(Job.java:511)
        at hudson.plugins.templateproject.ItemListenerImpl.onLoaded(ItemListenerImpl.java:42)
        at jenkins.model.Jenkins.<init>(Jenkins.java:863)
      

        Attachments

          Activity

          Hide
          brantone Brantone added a comment -

          Well ..
          for (AbstractProject<?,?> project : Hudson.getInstance().getAllItems(AbstractProject.class)) {
          is a pretty good start, in https://github.com/jenkinsci/template-project-plugin/blob/master/src/main/java/hudson/plugins/templateproject/ItemListenerImpl.java#L37

          Show
          brantone Brantone added a comment - Well .. for (AbstractProject<?,?> project : Hudson.getInstance().getAllItems(AbstractProject.class)) { is a pretty good start, in https://github.com/jenkinsci/template-project-plugin/blob/master/src/main/java/hudson/plugins/templateproject/ItemListenerImpl.java#L37
          Hide
          tsniatowski Tomasz Śniatowski added a comment -

          Any updates?

          Show
          tsniatowski Tomasz Śniatowski added a comment - Any updates?
          Hide
          brantone Brantone added a comment -

          Sadly not yet .... that 500ms timeout is a bit ridiculous, even with 500ms with around 1,000 jobs, that accounts for 8 minutes.
          That functionality on the Listener was added per https://github.com/jenkinsci/template-project-plugin/pull/5 -> https://issues.jenkins-ci.org/browse/JENKINS-16800 such that publishers behave as if they were there all along (ex: not part of template).

          Are you seeing this added functionality? Or even make use of it?

          Show
          brantone Brantone added a comment - Sadly not yet .... that 500ms timeout is a bit ridiculous, even with 500ms with around 1,000 jobs, that accounts for 8 minutes. That functionality on the Listener was added per https://github.com/jenkinsci/template-project-plugin/pull/5 -> https://issues.jenkins-ci.org/browse/JENKINS-16800 such that publishers behave as if they were there all along (ex: not part of template). Are you seeing this added functionality? Or even make use of it?
          Hide
          tsniatowski Tomasz Śniatowski added a comment -

          We were mostly seeing startup delays severe enough that made maintenance almost impossible, and ended up disabling the job config history plugin...

          I don't think we care about report links in particular that much, but having publishers be more reliable in general is something we might be relying on without realizing. As it is though, the situation is pretty bad. It seems worse in practice than 500ms/job, we're seeing numbers more in line with the original report which stated 30 minutes for 1000 jobs, so almost 2s/job.

          Show
          tsniatowski Tomasz Śniatowski added a comment - We were mostly seeing startup delays severe enough that made maintenance almost impossible, and ended up disabling the job config history plugin... I don't think we care about report links in particular that much, but having publishers be more reliable in general is something we might be relying on without realizing. As it is though, the situation is pretty bad. It seems worse in practice than 500ms/job, we're seeing numbers more in line with the original report which stated 30 minutes for 1000 jobs, so almost 2s/job.
          Hide
          brantone Brantone added a comment -

          The original code came in from William Bernardet over 2 years ago ... wonder if he's still around to speak into this??

          Show
          brantone Brantone added a comment - The original code came in from William Bernardet over 2 years ago ... wonder if he's still around to speak into this??
          Hide
          tsniatowski Tomasz Śniatowski added a comment -

          Seems the answer is "no". What now?

          Show
          tsniatowski Tomasz Śniatowski added a comment - Seems the answer is "no". What now?

            People

            • Assignee:
              mfriedenhagen Mirko Friedenhagen
              Reporter:
              marcus_phi Marcus Philip
            • Votes:
              6 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated: