Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12271

Jenkins DoS's itself with ajax checks from job/*/configure with large workspace

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • core
    • None

      Summary:

      A job with greedy glob expressions for archives, and a workspace with lots of directories will can bring Jenkins to it's knees due to all the background AJAX checks recursively checking the filesystem for matching files. The only resolution is to kill Jenkins and ask users to write less greedy expressions.

      Details:

      We had a Jenkins instance become unavailable due to 100% of CPU usage. There were several dozen requests that looked like this:

      "Handling GET /job/torquebox-2x-incremental/descriptorByName/hudson.tasks.ArtifactArchiver/checkArtifacts : http-8097-21" daemon prio=10 tid=0x00007ff3f001d800 nid=0x39ae runnable [0x00007ff3e2ae5000]
      java.lang.Thread.State: RUNNABLE
      at org.apache.tools.ant.util.VectorSet.doAdd(VectorSet.java:64)

      • locked <0x00007ff4415240b8> (a org.apache.tools.ant.util.VectorSet)
        at org.apache.tools.ant.util.VectorSet.addElement(VectorSet.java:75)
      • locked <0x00007ff4415240b8> (a org.apache.tools.ant.util.VectorSet)
        at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1236)
        at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1259)
        at org.apache.tools.ant.DirectoryScanner.scandir(DirectoryScanner.java:1259)
        ...

      The user was configuring a job which archived artifacts using the following glob expressions:

      integration-tests/target/rubygems, integration-tests/target/integ-dist/jboss/standalone/log/.log,integration-tests/apps//log/development.log,/target/surefire-reports/.txt, **/target/rspec-report.html

      The workspace for this job has 37k subdirectories, each of which was being checked for a match against /target/surefire-reports/*.txt and /target/rspec-report.html

      The real problem is that these Ajax threads pile up and eventually eat all the CPU if you are navigating in and out of the field multiple times. Eventually, these threads eat up all available CPU and the instance becomes unavailable.

      It seems like ArtifactArchiver.doCheckArtifacts() and JUnitResultsArchiver.doCheckTestResults should use a Callable/Future.get(30, seconds). Maybe it should be 15 seconds; I can't imagine someone waiting longer for a validation. Perhaps the thread can also have a lower priority?

            Unassigned Unassigned
            recampbell Ryan Campbell
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: