Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54603

Memory leak in remoting causes Jenkins to crash

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Component/s: remoting
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      Some of our jobs rely on an external slave. This has been working for a while without any issues. Recently, the number of jobs that run daily has been increased. This is when our problems started. After a while Jenkins has consumed all the memory available to the VM, and it locks up as a result.

      The log is full of this exception:

      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: SEVERE: This command is created here
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: Nov 12, 2018 5:03:53 PM hudson.remoting.Channel$1 handle
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: SEVERE: Failed to execute command Pipe.Flush(-1) (channel PLTSTSRV001)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: java.util.concurrent.ExecutionException: Invalid object ID -1 iota=1723
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:478)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.get(ExportTable.java:397)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.Channel.getExportedObject(Channel.java:780)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ProxyOutputStream$Flush.execute(ProxyOutputStream.java:307)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.Channel$1.handle(Channel.java:565)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:85)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: Caused by: java.lang.Exception: Object appears to be deallocated at lease before Mon Nov 12 16:39:51 CET 2018
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:474)
      Nov 12 17:03:53 mfa-sp.microflown.loc jenkins[46451]: ... 5 more

      I think this is what's causing Jenkins to leak.

        Attachments

        1. heap-histogram.txt
          15 kB
        2. Jenkins Job Output
          27 kB
        3. Pipeline.txt
          7 kB

          Issue Links

            Activity

            smokeythebandit Benjamin Martens created issue -
            Hide
            jthompson Jeff Thompson added a comment -

            It's going to take more information than provided for someone else to diagnose this issue.

            It looks like there may be some communication issues between the external agent and the master. That may be the cause of these errors in the log. There is no evidence provided that these error messages are connected to a memory leak.

            If there were a memory leak associated with these operations, we would expect it to manifest previously. With jobs running more frequently it shouldn't cause leaks that didn't exist but encounter them more quickly. Perhaps the memory load has just increased as a result of increased activity or configuration changes.

            I recommend you investigate what is consuming your memory. See if you can isolate any characteristics as to what is occurring, the time frames involved, and any plugins, jobs, or configurations that might be contributing. Is master or the agent running out of memory?

            Here is some information about hunting down OutOfMemory errors: https://wiki.jenkins.io/display/JENKINS/I%27m+getting+OutOfMemoryError . CloudBees provides some further information on configuring memory: https://support.cloudbees.com/hc/en-us/articles/204859670

            Good luck on your investigation.

            Show
            jthompson Jeff Thompson added a comment - It's going to take more information than provided for someone else to diagnose this issue. It looks like there may be some communication issues between the external agent and the master. That may be the cause of these errors in the log. There is no evidence provided that these error messages are connected to a memory leak. If there were a memory leak associated with these operations, we would expect it to manifest previously. With jobs running more frequently it shouldn't cause leaks that didn't exist but encounter them more quickly. Perhaps the memory load has just increased as a result of increased activity or configuration changes. I recommend you investigate what is consuming your memory. See if you can isolate any characteristics as to what is occurring, the time frames involved, and any plugins, jobs, or configurations that might be contributing. Is master or the agent running out of memory? Here is some information about hunting down OutOfMemory errors: https://wiki.jenkins.io/display/JENKINS/I%27m+getting+OutOfMemoryError  . CloudBees provides some further information on configuring memory: https://support.cloudbees.com/hc/en-us/articles/204859670 Good luck on your investigation.
            Hide
            jthompson Jeff Thompson added a comment -

            Jesse Glick, the error message here looks the same as in JENKINS-54566 that you've been looking it. I don't think the out-of-memory error is related to the log message but I'm not certain.

            Show
            jthompson Jeff Thompson added a comment - Jesse Glick , the error message here looks the same as in JENKINS-54566  that you've been looking it. I don't think the out-of-memory error is related to the log message but I'm not certain.
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Link This issue relates to JENKINS-54566 [ JENKINS-54566 ]
            Hide
            jglick Jesse Glick added a comment -

            Yeah the flush error would be JENKINS-54566 . I see no reason to think that would have any relationship to a memory leak.

            Show
            jglick Jesse Glick added a comment - Yeah the flush error would be JENKINS-54566 . I see no reason to think that would have any relationship to a memory leak.
            Hide
            jthompson Jeff Thompson added a comment -

            Benjamin Martens, we don't believe these flush error messages you're seeing are related to the failures you're experiencing. There's a PR to clean up the log messages a little bit: https://github.com/jenkinsci/remoting/pull/308/files . Can you provide more information about your out-of-memory issues or should we close this report out?

            Show
            jthompson Jeff Thompson added a comment - Benjamin Martens , we don't believe these flush error messages you're seeing are related to the failures you're experiencing. There's a PR to clean up the log messages a little bit: https://github.com/jenkinsci/remoting/pull/308/files  . Can you provide more information about your out-of-memory issues or should we close this report out?
            Hide
            jglick Jesse Glick added a comment -

            Jeff Thompson the remoting PR was just a side fix. The main fix for the Pipe.Flush error is in the workflow-api plugin, under review, as linked from JENKINS-54566.

            Show
            jglick Jesse Glick added a comment - Jeff Thompson the remoting PR was just a side fix. The main fix for the Pipe.Flush error is in the workflow-api plugin, under review, as linked from JENKINS-54566 .
            Hide
            jthompson Jeff Thompson added a comment -

            Oh, I missed that separation, Jesse Glick. Thanks for clarifying. Does that workflow-api plugin issue have anything to do with this reported memory leak?

            Show
            jthompson Jeff Thompson added a comment - Oh, I missed that separation, Jesse Glick . Thanks for clarifying. Does that workflow-api plugin issue have anything to do with this reported memory leak?
            Hide
            jglick Jesse Glick added a comment -

            I cannot speculate about any relationship to a memory leak, since we have no diagnostics for that. The workflow-api plugin patch fixes (or purports to fix) the Failed to execute command Pipe.Flush error.

            Show
            jglick Jesse Glick added a comment - I cannot speculate about any relationship to a memory leak, since we have no diagnostics for that. The workflow-api plugin patch fixes (or purports to fix) the Failed to execute command Pipe.Flush error.
            smokeythebandit Benjamin Martens made changes -
            Attachment Jenkins Job Output [ 45217 ]
            smokeythebandit Benjamin Martens made changes -
            Attachment Jenkins Job Output [ 45218 ]
            smokeythebandit Benjamin Martens made changes -
            Attachment Pipeline.txt [ 45219 ]
            Hide
            smokeythebandit Benjamin Martens added a comment - - edited

            Hey guys thank you for your response! I've upgraded the 'Pipeline: API' plugin from version 2.32 to 2.33. Usually the server crashes within 24 hours of its last restart, I will keep monitoring it and see if the issue is resolved.

            Jeff Thompson It is the master that runs out of memory. I cannot really pin down an event that is causing it to run out of memory. I did notice that when I increased the memory allocated to the VM from 1024MB to 2048MB it took longer for the server to crash, confirming that its probably a memory leak.

            I've attached the log of one of the jobs. I had to remove some of the output because it contains sensitive information. The information I removed is generated by a python script that runs automated tests for our software.

            Included are the pipeline script and the output it generated for the job that I suspect is causing this issue.

            Jenkins Job Output

            Pipeline.txt

            Show
            smokeythebandit Benjamin Martens added a comment - - edited Hey guys thank you for your response! I've upgraded the 'Pipeline: API' plugin from version 2.32 to 2.33. Usually the server crashes within 24 hours of its last restart, I will keep monitoring it and see if the issue is resolved. Jeff Thompson It is the master that runs out of memory. I cannot really pin down an event that is causing it to run out of memory. I did notice that when I increased the memory allocated to the VM from 1024MB to 2048MB it took longer for the server to crash, confirming that its probably a memory leak. I've attached the log of one of the jobs. I had to remove some of the output because it contains sensitive information. The information I removed is generated by a python script that runs automated tests for our software. Included are the pipeline script and the output it generated for the job that I suspect is causing this issue. Jenkins Job Output Pipeline.txt
            smokeythebandit Benjamin Martens made changes -
            Attachment Jenkins Job Output [ 45218 ]
            smokeythebandit Benjamin Martens made changes -
            Attachment Jenkins Job Output [ 45217 ]
            smokeythebandit Benjamin Martens made changes -
            Attachment Jenkins Job Output [ 45220 ]
            Hide
            jglick Jesse Glick added a comment -

            Benjamin Martens the build log is unlikely to be useful in diagnosing a memory leak. The bare minimum would be a heap histogram. This is most easily collected by installing the Support Core plugin, then selecting the Master Heap Histogram diagnostic when getting a Support bundle. You can attach that individual nodes/master/heap-histogram.txt, or send the bundle to one of us privately, or select the system option to anonymize support bundles and then attach the whole bundle here (always best to give the contents a manual review to look for sensitive information).

            Show
            jglick Jesse Glick added a comment - Benjamin Martens the build log is unlikely to be useful in diagnosing a memory leak. The bare minimum would be a heap histogram. This is most easily collected by installing the Support Core plugin, then selecting the Master Heap Histogram diagnostic when getting a Support bundle. You can attach that individual nodes/master/heap-histogram.txt , or send the bundle to one of us privately, or select the system option to anonymize support bundles and then attach the whole bundle here (always best to give the contents a manual review to look for sensitive information).
            smokeythebandit Benjamin Martens made changes -
            Attachment heap-histogram.txt [ 45225 ]
            Hide
            smokeythebandit Benjamin Martens added a comment - - edited

            http://example.com/I installed this plugin and ran a couple of jobs. I'm not sure how to interpret the heap histogram, but I'm starting to suspect this plugin: https://wiki.jenkins.io/display/JENKINS/Test+Results+Analyzer+Plugin

            Anyway, here is the heap histogram.

            heap-histogram.txt

            Show
            smokeythebandit Benjamin Martens added a comment - - edited http://example.com/ I installed this plugin and ran a couple of jobs. I'm not sure how to interpret the heap histogram, but I'm starting to suspect this plugin:  https://wiki.jenkins.io/display/JENKINS/Test+Results+Analyzer+Plugin Anyway, here is the heap histogram. heap-histogram.txt
            Hide
            jglick Jesse Glick added a comment -

            Indeed. If this plugin is not critical for your workflow, try disabling it for a while.

            Show
            jglick Jesse Glick added a comment - Indeed. If this plugin is not critical for your workflow, try disabling it for a while.
            smokeythebandit Benjamin Martens made changes -
            Assignee Jeff Thompson [ jthompson ] Benjamin Martens [ smokeythebandit ]
            Hide
            smokeythebandit Benjamin Martens added a comment -

            It is important for some developers in our company. I think it's best that they open a new ticket and figure it out with the maintainer of the plugin.

            Jeff Thompson Jesse Glick Thank you for your help!

             

            Show
            smokeythebandit Benjamin Martens added a comment - It is important for some developers in our company. I think it's best that they open a new ticket and figure it out with the maintainer of the plugin. Jeff Thompson Jesse Glick Thank you for your help!  
            smokeythebandit Benjamin Martens made changes -
            Status Open [ 1 ] Fixed but Unreleased [ 10203 ]
            Resolution Fixed [ 1 ]
            smokeythebandit Benjamin Martens made changes -
            Status Fixed but Unreleased [ 10203 ] Closed [ 6 ]
            Resolution Fixed [ 1 ] Not A Defect [ 7 ]

              People

              • Assignee:
                smokeythebandit Benjamin Martens
                Reporter:
                smokeythebandit Benjamin Martens
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: