Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20620

Memory Leak on Jenkins LTS 1.509.4/1.532.1

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Incomplete
    • Component/s: core
    • Environment:
      Linux Server with 24 CPUs and 64GB RAM
      Jenkins version LTS 1.509.4/1.532.1 on Jetty
      Memory allocated for Jenkins/Jetty process: 42GB
      Environment: Jenkins working with 600 jobs with high activities + 40 slave machines (Linux and Windows)
    • Similar Issues:

      Description

      After Jenkins upgraded to LTS 1.509.4 from LTS 1.509.3 I notice that over time (24 hours) Jenkins becomes very slowly.
      It turns out that Jenkins (under Jetty service) slowly "eats" the server memory. It's takes about 24 hours to take all the memory allocated to Jenkins (42GB). See the snapshots with examples...

      TEST #1 on LTS 1.509.4:
      1. Machine with Jetty up after restart
      2. Jenkins Used - After one Hour: 22GB
      3. Jenkins Used - After 12 Hours: 27GB
      4. Jenkins Used - After 20 Hours: 35GB -> Memory leaks between 10:00-10:20 as you can see after GC it's still think Java in-use and fail to cleanup the all memory as it should be.
      5. Jenkins Used - After 23 Hours: 39GB -> Very slow response and Heap is almost 100%

      TEST #2 on LTS 1.509.4:
      I tried to do manual GC, Doesn't help!
      (see attach file: "Monitor_Memory_Over_Time_Manual_GC")

      TEST #3 on LTS 1.509.3:
      Unfortunately I downgrade to LTS 1.509.3 because of the memory leak, for me it's a blocker issue!

      Please note that on version LTS 1.509.3 Jenkins works stable even on high environment without any memory leak... (See attach files: "Good_GC_A1.509.3" and "Good_GC_B1.509.3") but unfortunately there is a BIG unsolved problem/bug in this version, I can't rename jobs (Deadlock! which solved on the next version LTS 1.509.4/1.532.1 that I can't use because of the memory leak).

      TEST #4 with LTS 1.532.1:
      Same issue! Jenkins stuck with 100% memory usage after only 12 hours!

      Thank You,
      Ronen.

        Attachments

        1. gc.log
          15 kB
        2. Good_GC_A1.509.3.JPG
          Good_GC_A1.509.3.JPG
          268 kB
        3. Good_GC_B1.509.3.jpg
          Good_GC_B1.509.3.jpg
          322 kB
        4. Monitor_After_1_Hour.jpg
          Monitor_After_1_Hour.jpg
          238 kB
        5. Monitor_After_12_Hours.jpg
          Monitor_After_12_Hours.jpg
          224 kB
        6. Monitor_After_20_Hours.jpg
          Monitor_After_20_Hours.jpg
          321 kB
        7. Monitor_After_23_Hours.JPG
          Monitor_After_23_Hours.JPG
          253 kB
        8. Monitor_Memory_Over_Time_Manual_GC.jpg
          Monitor_Memory_Over_Time_Manual_GC.jpg
          572 kB

          Activity

          Hide
          ronenpg Ronen Peleg added a comment -

          Hi Nickolay, Yes we have a Groovy scripts on our jobs.

          Show
          ronenpg Ronen Peleg added a comment - Hi Nickolay, Yes we have a Groovy scripts on our jobs.
          Hide
          ronenpg Ronen Peleg added a comment -

          Update:
          The solution was to delete some slave machines from the Jenkins nodes.
          It turns out that Jenkins can't handle more than 100 slave machines.
          Currently we have (after cleanup) 70 slave machines and no memory leak!

          BTW: The memory leak issue occurs only on Jenkins Master running on Linux O/S with more than 100 slave machines, Actually on Windows O/S this issue doesn't exist!

          Show
          ronenpg Ronen Peleg added a comment - Update: The solution was to delete some slave machines from the Jenkins nodes. It turns out that Jenkins can't handle more than 100 slave machines. Currently we have (after cleanup) 70 slave machines and no memory leak! BTW: The memory leak issue occurs only on Jenkins Master running on Linux O/S with more than 100 slave machines, Actually on Windows O/S this issue doesn't exist!
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          I've tested Jenkins 1.509.4(patched by remoting-2.36)/RHEL6.4 with about 150 slaves.
          There's no memory leak after 1 week. The test installation just builds several Jenkins plugins, hence there's no extremal load

          Probably, the error could be in the communication layer. I'll try the remoting version from 1.532.1 with a bigger workload

          Show
          oleg_nenashev Oleg Nenashev added a comment - I've tested Jenkins 1.509.4(patched by remoting-2.36)/RHEL6.4 with about 150 slaves. There's no memory leak after 1 week. The test installation just builds several Jenkins plugins, hence there's no extremal load Probably, the error could be in the communication layer. I'll try the remoting version from 1.532.1 with a bigger workload
          Hide
          kohsuke Kohsuke Kawaguchi added a comment -

          We need more information to be able to solve problems like this. Please see https://wiki.jenkins-ci.org/display/JENKINS/I%27m+getting+OutOfMemoryError for how to get the details we need to be able to work on problems like this.

          I'm not doubting that you are seeing the problem, and for that I am sorry. Please get us the details we need so that we can fix the problem.

          If you cannot post a heap dump, please get at least the histogram summary.

          Show
          kohsuke Kohsuke Kawaguchi added a comment - We need more information to be able to solve problems like this. Please see https://wiki.jenkins-ci.org/display/JENKINS/I%27m+getting+OutOfMemoryError for how to get the details we need to be able to work on problems like this. I'm not doubting that you are seeing the problem, and for that I am sorry. Please get us the details we need so that we can fix the problem. If you cannot post a heap dump, please get at least the histogram summary.
          Hide
          ronenpg Ronen Peleg added a comment - - edited

          @Oleg Nenashev, Did you try it with 1200 active jobs? anyway this is what solved my problem. I guess you have issue with 100+ slave machines connected to Jenkins with high load Jenkins.

          @Kohsuke Kawaguchi, because I have 64GB RAM, I can't do it, I can't save 64GB RAM on my HDD and anyway my problem is already solved so it's save to close it.

          Show
          ronenpg Ronen Peleg added a comment - - edited @Oleg Nenashev, Did you try it with 1200 active jobs? anyway this is what solved my problem. I guess you have issue with 100+ slave machines connected to Jenkins with high load Jenkins. @Kohsuke Kawaguchi, because I have 64GB RAM, I can't do it, I can't save 64GB RAM on my HDD and anyway my problem is already solved so it's save to close it.

            People

            • Assignee:
              oleg_nenashev Oleg Nenashev
              Reporter:
              ronenpg Ronen Peleg
            • Votes:
              21 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: