Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49931

Heap Histogram Collection Destabilizes Masters

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      When we added Heap Stats collection (https://issues.jenkins-ci.org/browse/JENKINS-22791) in v2.42 it appears that we inadvertently caused a major performance-stability regression if the histogram is collected regularly.

      How? Well, this gathers a live heap histogram. This appears to triggers a Full GC. This is visible in GC logs because they show the following cause:

      > [Full GC (Heap Inspection Initiated GC).

      Now, because this is a FullGC and not a concurrent or young-gen GC, and we're generally using G1 GC, the slow Serial garbage collector is used for FullGC. This is a NON-concurrent GC mode, meaning the application is fully paused until it completes, and it is SINGLE-threaded, meaning rather than 1 GB/s per CPU of GC throughput, we get <1 GB/s total. It also cleans and compacts the entire heap rather than just part of it as with other modes.

      So, with 15 GB of used heap that means a pause of up ~15s. This matches behavior observed in the wild.

      I am rating this as critical because on larger-scale production masters a hang that long can cause job failures, visible UI hangs, HTTP request timeouts, and other issues – it should result in Surable Task failures for Pipelines, for example.

      Proposed solution: only gather the live heap histogram when a user is explicitly requesting a support bundle (disable it by default).

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment -

            Assigned to Emilio Escobar because I think he has a fix already almost done for it.

            Show
            svanoort Sam Van Oort added a comment - Assigned to Emilio Escobar because I think he has a fix already almost done for it.
            Hide
            escoem Emilio Escobar added a comment -
            Show
            escoem Emilio Escobar added a comment - https://github.com/jenkinsci/support-core-plugin/pull/134 already available for review!
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Emilio Escobar
            Path:
            src/main/java/com/cloudbees/jenkins/support/impl/HeapUsageHistogram.java
            http://jenkins-ci.org/commit/support-core-plugin/a3ff0c7e985b0ecb9a4460bab94932bc55a17b6a
            Log:
            [FIXED JENKINS-49931] DISABLED by default

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Emilio Escobar Path: src/main/java/com/cloudbees/jenkins/support/impl/HeapUsageHistogram.java http://jenkins-ci.org/commit/support-core-plugin/a3ff0c7e985b0ecb9a4460bab94932bc55a17b6a Log: [FIXED JENKINS-49931] DISABLED by default
            Show
            escoem Emilio Escobar added a comment - new https://github.com/jenkinsci/support-core-plugin/pull/135

              People

              • Assignee:
                escoem Emilio Escobar
                Reporter:
                svanoort Sam Van Oort
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: