Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50223

Pipeline Jenkins master ran out of Java Metaspace

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      workflow-cps 2.69

      Description

      Setup

      Jenkins v2.89.3 / v2.89.4 LTS
      Pipeline API Plugin: 2.26
      Pipeline Nodes and Processes Plugin: 2.19
      Durable Task Plugin: 1.18
      Pipeline Job Plugin: 2.17
      Pipeline Shared Groovy Libraries Plugin: 2.9
      Pipeline Supporting APIs Plugin: 2.18
      Pipeline Groovy Plugin: 2.45
      Script Security Plugin: 1.41

      Pipeline Default Speed/Durability Level: Performance-Optimized
      "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs

      Problem

      Shortly after upgrading from Jenkins 2.60.3 to Jenkins 2.89.3 LTS, we experienced a java.lang.OutOfMemoryError: Metaspace error. I'm unsure if this a bug or if increased Java Metaspace is to be expected in the newer version of Jenkins and pipeline plugins.

      Details

      Jenkins v2.60.3

      In Jenkins 2.60.3 we used the memory parameters below and the Java nonheap space (Metaspace) stayed between 350mb and 550mb. We ran for months without restarting Jenkins.

      -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxMetaspaceSize=1g -XX:MetaspaceSize=256M -Xms4g -Xmx8g -Dgroovy.use.classvalue=true

      Jenkins v2.89.3

      After upgrading to Jenkins 2.89.3 and upgrading all of the plugins, we used the Java memory settings below. We removed the "groovy.use.classvalue=true" parameter based on the recommendation here: https://support.cloudbees.com/hc/en-us/articles/115001505172-Jenkins-hangs-due-to-Groovy-ClassInfo-deadlock. We saw the Java nonheap memory fluctuate between 250mb and 750mb slowing trending upwards. After a few weeks, nonheap memory was at 1GB and Jenkins no longer responded to requests. The log file was full of java.lang.OutOfMemoryError: Metaspace errors.

      -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxMetaspaceSize=1g -XX:MetaspaceSize=256M -Xms4g -Xmx8g

      Debugging

      Our Jenkins masters perform a large amount of Jenkins pipeline jobs and I noticed that all of our Jenkins 2.89.4 masters show higher Java nonheap memory usage than before the Jenkins upgrade. We disabled pipeline durability but still see high memory usage. Standard Java heap memory usage looks fine.

      Example 1:

      Jenkins Master 1 was completely idle and I performed a System.gc() to force a full garbage collection. Afterwards I saw that the nonheap memory still was over 700mb. Below are some of the Java details I collected on that installation just after the full GC.

      VM.native_memory summary

      Native Memory Tracking:

      Total: reserved=11014208KB, committed=5551020KB
      Java Heap (reserved=8388608KB, committed=4194304KB)
      (mmap: reserved=8388608KB, committed=4194304KB)

      Class (reserved=1686933KB, committed=707989KB)
      (classes #48326)
      (malloc=13717KB #208203)
      (mmap: reserved=1673216KB, committed=694272KB)

      Thread (reserved=93951KB, committed=93951KB)
      (thread #92)
      (stack: reserved=93548KB, committed=93548KB)
      (malloc=296KB #463)
      (arena=107KB #182)

      Code (reserved=273528KB, committed=155620KB)
      (malloc=23928KB #24978)
      (mmap: reserved=249600KB, committed=131692KB)

      GC (reserved=412351KB, committed=256703KB)
      (malloc=68287KB #693569)
      (mmap: reserved=344064KB, committed=188416KB)

      Compiler (reserved=406KB, committed=406KB)
      (malloc=276KB #2009)
      (arena=131KB #3)

      Internal (reserved=88791KB, committed=88791KB)
      (malloc=88759KB #184270)
      (mmap: reserved=32KB, committed=32KB)

      Symbol (reserved=30516KB, committed=30516KB)
      (malloc=27279KB #301740)
      (arena=3236KB #1)

      Native Memory Tracking (reserved=22549KB, committed=22549KB)
      (malloc=348KB #5361)
      (tracking overhead=22201KB)

      Arena Chunk (reserved=190KB, committed=190KB)
      (malloc=190KB)

      Unknown (reserved=16384KB, committed=0KB)
      (mmap: reserved=16384KB, committed=0KB)

      GC.class_histogram

      num #instances #bytes class name
      ----------------------------------------------
      1: 490462 84832616 [C
      2: 2552835 40845360 com.cloudbees.groovy.cps.impl.ConstantBlock
      3: 930699 37227960 com.cloudbees.groovy.cps.impl.FunctionCallBlock
      4: 1493734 35849616 com.cloudbees.groovy.cps.impl.SourceLocation
      5: 883507 33258176 [Ljava.lang.Object;
      6: 179552 30097544 [B
      7: 922229 29511328 java.util.HashMap$Node
      8: 1151159 27386104 [Lcom.cloudbees.groovy.cps.Block;
      9: 947492 22739808 java.lang.String
      10: 790957 18982968 com.cloudbees.groovy.cps.impl.LocalVariableBlock
      11: 213822 13097984 [Ljava.util.HashMap$Node;
      12: 519301 12463224 com.cloudbees.groovy.cps.impl.SequenceBlock
      13: 452808 10867392 java.util.ArrayList
      14: 320616 10259712 com.cloudbees.groovy.cps.impl.PropertyAccessBlock
      15: 250810 10032400 com.google.common.cache.LocalCache$WeakEntry
      16: 168578 9440368 org.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry
      17: 260734 8343488 java.util.concurrent.locks.ReentrantLock$NonfairSync
      18: 250147 7394208 [Lhudson.model.Action;
      19: 142590 6844320 java.util.HashMap
      20: 139363 6689424 org.jenkinsci.plugins.workflow.cps.nodes.StepAtomNode
      21: 264178 6340272 com.google.common.collect.SingletonImmutableList
      22: 195095 6243040 com.cloudbees.groovy.cps.impl.AssignmentBlock
      23: 49477 6237064 java.lang.Class
      24: 253041 6072984 java.util.concurrent.CopyOnWriteArrayList
      25: 250681 6016344 org.jenkinsci.plugins.workflow.support.storage.BulkFlowNodeStorage$Tag

      GC.class_stats

      We use the Pipeline Shared Groovy Libraries Plugin and noticed a large number of class files with the names similar to below. Each of our shared library classes (org.xxxxxxxx.scripts.xxxxxxx) had about 400 references.

      script15202683643871464209649
      sun.reflect.GeneratedMethodAccessor2903
      org.xxxxxxxx.scripts.xxxxxxx

      Example 2

      I ran a Java heap dump through an analyzer program and this is what it stated was the top consumers.

      Class Name Objects Shallow Heap Retained Heap
      org.jenkinsci.plugins.workflow.support.actions.LogActionImpl 976,720 23,441,280 >= 343,262,256
      org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl 735,961 17,663,064 >= 305,348,632

      Example 3

      I ran a different Java heap dump through the analyzer and below are the results. Again this was after a full GC.

      Classes = 35k
      Objects = 47M
      Class Load = 4.6k
      GC Roots = 4.3k

      952,652 instances of "org.jenkinsci.plugins.workflow.support.actions.LogActionImpl", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c1746598" occupy 342,684,616 (20.83%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"

      717,629 instances of "org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c17d9770" occupy 296,727,528 (18.04%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"

      293 instances of "org.jenkinsci.plugins.workflow.cps.CpsFlowExecution", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c17d9770" occupy 210,909,800 (12.82%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"

      Package Retained Heap Retained Heap, % Top Dominators
      workflow 1,340,511,168 81.50% 5,043,559

      Workaround

      We once again added "-Dgroovy.use.classvalue=true" to Java startup parameters and have noticed a reduction in Java nonheap memory usage. With the parameter we see the memory increase to around 750mb but when the server becomes idle the Java nonheap memory is reduced to under 300mb. Without the parameter we don't see the same reduction. Currently one of our Jenkins masters that doesn't have the parameter started on 3/2/18 at .2 GB and is currently (3/16/18) at .76GB. The graph looks like slow climb without much fluctuation. According to the Cloudbees article, this parameter is not recommended for Jenkins 2.89 however it does seem to help lower Java nonheap memory usage. Since it increases by over 200 mb a week, I should know shortly if some type of garbage collection reduces it on this second Jenkins master that is nearing the 1GB max. Java nonheap memory usage only reduced by 1mb when I manually performed a System.gc().

        Attachments

          Activity

          Hide
          jglick Jesse Glick added a comment -

          Finally tracked it down, to ClassHelperCache, and managed to fully clear all classes immediately upon end of the build.

          Show
          jglick Jesse Glick added a comment - Finally tracked it down, to ClassHelperCache , and managed to fully clear all classes immediately upon end of the build.
          Hide
          jglick Jesse Glick added a comment -

          For future reference, my monitoring script for the live version:

          while :; do jmap -histo:live $(jps -lm | fgrep war | cut -f1 -d' ') | fgrep GroovyClassLoader > /tmp/x; clear; cat /tmp/x; sleep 5; done
          
          Show
          jglick Jesse Glick added a comment - For future reference, my monitoring script for the live version: while :; do jmap -histo:live $(jps -lm | fgrep war | cut -f1 -d ' ' ) | fgrep GroovyClassLoader > /tmp/x; clear; cat /tmp/x; sleep 5; done
          Hide
          dnusbaum Devin Nusbaum added a comment -

          A fix for this issue was just released in version 2.69 of Pipeline Groovy Plugin.

          Show
          dnusbaum Devin Nusbaum added a comment - A fix for this issue was just released in version 2.69 of Pipeline Groovy Plugin.
          Hide
          tatuatindeed Tatu Saloranta added a comment -

          Excellent! I think we managed to trigger this problem as well, via some seemingly unrelated refactoring, resulting in max uptime of 4-5 days for some masters.

          For what it is worth, Jenkins Metrics plug-in sends 2 useful metrics – `vm.class.loaded`, `vm.class.unloaded` – difference of which is number of classes currently retained. So we are graphing that to see class retention over time, which showed steady increase without pronunced sawtooth pattern which normally occurs.

          Show
          tatuatindeed Tatu Saloranta added a comment - Excellent! I think we managed to trigger this problem as well, via some seemingly unrelated refactoring, resulting in max uptime of 4-5 days for some masters. For what it is worth, Jenkins Metrics plug-in sends 2 useful metrics – `vm.class.loaded`, `vm.class.unloaded` – difference of which is number of classes currently retained. So we are graphing that to see class retention over time, which showed steady increase without pronunced sawtooth pattern which normally occurs.
          Hide
          tatuatindeed Tatu Saloranta added a comment - - edited

          Alas, it is actually not sure version 2.69 fixes this problem for good. Problem came back (we had initially reverted code we deemed that must have triggered retention), and upgrade of Pipeline Groovy plug-in to 2.70 does not seem to have solved steady increase (although due to time it takes for classes to accumulate, over 1 week, it is difficult to be sure after 2 days since restart). If confirmed, it's back to Ye Olde Drawing Board to solve the problem.

          Show
          tatuatindeed Tatu Saloranta added a comment - - edited Alas, it is actually not sure version 2.69 fixes this problem for good. Problem came back (we had initially reverted code we deemed that must have triggered retention), and upgrade of Pipeline Groovy plug-in to 2.70 does not seem to have solved steady increase (although due to time it takes for classes to accumulate, over 1 week, it is difficult to be sure after 2 days since restart). If confirmed, it's back to Ye Olde Drawing Board to solve the problem.

            People

            • Assignee:
              jglick Jesse Glick
              Reporter:
              mkozell Mike Kozell
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: