Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58018

High CPU load at slave with SSH-agent running freestyle project

    Details

    • Type: Improvement
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.164.3,
      Slave: OS: Ubuntu 18.04 LTS, JDK: OpenJDK 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03
    • Similar Issues:

      Description

      When running "freestyle project" (gmake) at slave, java takes high CPU percentage: from 40% to even 70%, 50% maybe considered average.

      The load does not depend on build-process output.

      Easy way to reproduce:

      1) Create repository with single Makefile that just waits:

      all:
          sleep 300

      2) Create jenkins "freestyle project" with single build step:

      make

      3) Execute the build at node

      What you would see at slave is java process taking about 50% CPU waiting for process that does nothing.

      I tried various options for agent including http://-dhudson.remoting.remoteinvocationhandler/$Unexporter.retainOrigin=false, but it makes no difference.

       I've run remoting with profiler. Unfortunately I don't have results saved and it's hard for me to repeat this investigation right now, but I remember that most CPU was consumed for I/O read attempts. Child process output stream polling? Anyway - one can easily reproduce it as described above.

      This affects make-style (freestyle builds) only. E.g. for maven builds, remoting process behaves correctly, I mean it takes few % of CPU.

      I create it as "Improvement" as it allows builds to run SOMEHOW. But, frankly speaking, for me heating CPU for nothing is a bug. Besides, we have some builds which involve timing-tests - for these ones this is KO as tests cannot complete in such heavily-loaded environment.

       

       

       

        Attachments

          Activity

          Hide
          macmac Maciej Maczynski added a comment - - edited

          It looks like I was completely wrong regarding the source of this CPU load. I noticed that sshd on slave also gives noticeable CPU load during build process.
          I've made tcpdump session and found out that there is constant date exchange between jenkins master and slave. I counted TCP data frames during "sleep 60" execution: there were about 37 MB sent from slave to master and about 18 MB from master to slave. This gives me back when I started, trying to set retainOrigin agent option, but it does not help. Maybe I'm setting it wrong? This is my agent process:

          /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -Dhudson.remoting.RemoteInvocationHandler.Unexporter.retainOrigin=false -jar remoting.jar -workDir /home/jenkins

          Is retainOrigin passed correctly? Or maybe there are another options that can be used to limit sending (probably some error) information between agent and slave? Or some logging options to see what actually happens there? All I can see in remoting.log are messages like that:

           

          INFO: rate(1min) = 333.0±32.1/sec; rate(5min) = 224.4±167.3/sec; rate(15min) = 102.7±160.2/sec; rate(total) = 0.9±18.0/sec; N = 24,002
          Jun 25, 2019 7:12:26 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
          INFO: The 1 minute average rate is 333.0±32.1/sec. At the 95% confidence level this is above 100.0/sec. If this message is repeated often in the logs then consider setting system property '
          hudson.remoting.RemoteInvocationHandler.Unexporter.retainOrigin' to 'false' to trade debug diagnostics for reduced memory pressure.

           

          Any hints wellcome

           

           

           

          Show
          macmac Maciej Maczynski added a comment - - edited It looks like I was completely wrong regarding the source of this CPU load. I noticed that sshd on slave also gives noticeable CPU load during build process. I've made tcpdump session and found out that there is constant date exchange between jenkins master and slave. I counted TCP data frames during "sleep 60" execution: there were about 37 MB sent from slave to master and about 18 MB from master to slave. This gives me back when I started, trying to set retainOrigin agent option, but it does not help. Maybe I'm setting it wrong? This is my agent process: /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -Dhudson.remoting.RemoteInvocationHandler.Unexporter.retainOrigin=false -jar remoting.jar -workDir /home/jenkins Is retainOrigin passed correctly? Or maybe there are another options that can be used to limit sending (probably some error) information between agent and slave? Or some logging options to see what actually happens there? All I can see in remoting.log are messages like that:   INFO: rate(1min) = 333.0±32.1/sec; rate(5min) = 224.4±167.3/sec; rate(15min) = 102.7±160.2/sec; rate(total) = 0.9±18.0/sec; N = 24,002 Jun 25, 2019 7:12:26 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: The 1 minute average rate is 333.0±32.1/sec. At the 95% confidence level this is above 100.0/sec. If this message is repeated often in the logs then consider setting system property ' hudson.remoting.RemoteInvocationHandler.Unexporter.retainOrigin' to 'false' to trade debug diagnostics for reduced memory pressure.   Any hints wellcome      
          Hide
          macmac Maciej Maczynski added a comment -

          One more thing: this happens also with OpenJDK 11, which is default at Ubuntu 18.04 (apt-get install default-jdk).

          Show
          macmac Maciej Maczynski added a comment - One more thing: this happens also with OpenJDK 11, which is default at Ubuntu 18.04 (apt-get install default-jdk).
          Hide
          jthompson Jeff Thompson added a comment -

          Hmmm ... 

          So, it's not really a CPU load problem. sshd also shows heavy CPU load. That's good to know.

          That's a lot of data transfer. I didn't have anything like that with my jobs trying to duplicate the problem. My console output shows practically nothing. Do you get a large amount of stuff in your console output?

          A lot of times things that appear to be Remoting problems turn out to be caused by plugins. Sometimes it involves weird interactions between plugins. That could be what's going on here. Unfortunately, it can be complicated to isolate the problem and track it down.

          I don't have experience with the retainOrigin setting. Reading the code, it appears to reduce some of the resource consumption. However, if you're getting to the point where that makes a difference, there is probably something going on that is still keeping the consumption too high. Sometimes it's high server load. Network issues. Agent load. VM host issues or load.

          These links have some related info: JENKINS-49736, JENKINS-48810, and https://groups.google.com/forum/#!topic/jenkinsci-dev/zAkbD_NmnrM . Most of these point to resource issues.

          These can be really challenging to track down.

          Show
          jthompson Jeff Thompson added a comment - Hmmm ...  So, it's not really a CPU load problem. sshd also shows heavy CPU load. That's good to know. That's a lot of data transfer. I didn't have anything like that with my jobs trying to duplicate the problem. My console output shows practically nothing. Do you get a large amount of stuff in your console output? A lot of times things that appear to be Remoting problems turn out to be caused by plugins. Sometimes it involves weird interactions between plugins. That could be what's going on here. Unfortunately, it can be complicated to isolate the problem and track it down. I don't have experience with the retainOrigin setting. Reading the code, it appears to reduce some of the resource consumption. However, if you're getting to the point where that makes a difference, there is probably something going on that is still keeping the consumption too high. Sometimes it's high server load. Network issues. Agent load. VM host issues or load. These links have some related info: JENKINS-49736 , JENKINS-48810 , and  https://groups.google.com/forum/#!topic/jenkinsci-dev/zAkbD_NmnrM  . Most of these point to resource issues. These can be really challenging to track down.
          Hide
          macmac Maciej Maczynski added a comment -

          There is no console output from build-process at all.  Checking this option is the reason why I ended-up in "sleep project" - it just runs "sleep" command and produces no output. So build process output is not the reason of these tons of data transferred.

          Issues you've quoted seems to be related rather to network connectivity - not the case here. My connection between node and server is rock-solid. Just very busy sending nobody-knows-what.

          VM load, agent load, server load - also not the case here. My server is happily serving several nodes on various OSes. This surely not happens for Windows slaves. And rather not for older linux slaves (I'm not 100%, must verify it). But try any Ubuntu 18.04 LTS - and you got it.

          These tons of traffic between master and slave must be Jenkins-originated. Only remoting java process can send to server TCP data I've tcpdump-ed.  And as child process is silent - it must be remoting that generates it. I tried all those options for reporting, stats - it has no effect (unless I specify it incorrectly).

          What I really need now is to see what data are transferred so vigorously between node and server. Is there any low-level logging option that lets you see the traffic between node and server? Sniffing the network is useless as data stream is encrypted.

           

          Show
          macmac Maciej Maczynski added a comment - There is no console output from build-process at all.  Checking this option is the reason why I ended-up in "sleep project" - it just runs "sleep" command and produces no output. So build process output is not the reason of these tons of data transferred. Issues you've quoted seems to be related rather to network connectivity - not the case here. My connection between node and server is rock-solid. Just very busy sending nobody-knows-what. VM load, agent load, server load - also not the case here. My server is happily serving several nodes on various OSes. This surely not happens for Windows slaves. And rather not for older linux slaves (I'm not 100%, must verify it). But try any Ubuntu 18.04 LTS - and you got it. These tons of traffic between master and slave must be Jenkins-originated. Only remoting java process can send to server TCP data I've tcpdump-ed.  And as child process is silent - it must be remoting that generates it. I tried all those options for reporting, stats - it has no effect (unless I specify it incorrectly). What I really need now is to see what data are transferred so vigorously between node and server. Is there any low-level logging option that lets you see the traffic between node and server? Sniffing the network is useless as data stream is encrypted.  
          Hide
          lifeofguenter Gunter Grodotzki added a comment -
          Show
          lifeofguenter Gunter Grodotzki added a comment - Potentially related: https://issues.jenkins-ci.org/browse/JENKINS-56581

            People

            • Assignee:
              jthompson Jeff Thompson
              Reporter:
              macmac Maciej Maczynski
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: