Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22320

High CPU consumption because of SSH communication

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Our Jenkins instance runs 1.553 (since two weeks, was an three year old version before) on Linux with 10 slaves: 6 Linux connected through SSH and 4 windows with WebStart agent started as services.

      When no job is running, CPU is almost 0. Correct.

      When only two or three jobs are running, Jenkins CPU raises between 70% to 160%. atop declares 96% of CPU time spent in IRQ (even with "no" disk access), most CPU consumption is considered as system time. On average since boot, the master node consumes 100% of one CPU. Even if it has 4 CPUs, job execution time is between 2x and 3x compared to older version.

      I configured JMX and did a quick CPU profiling. Top consumer threads are unnamed and are all related to SSH communication

      "Thread-13" - Thread t@90
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:152)
      	at java.net.SocketInputStream.read(SocketInputStream.java:122)
      	at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41)
      	at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52)
      	at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79)
      	at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
      	at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
      	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:682)
      	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:480)
      	at java.lang.Thread.run(Thread.java:744)
      

      So there is chance the "SSH agent plugin" is concerned.

      I am ready to do deeper analysis on my system if required, and of course to test patches.

        Attachments

          Issue Links

            Activity

            Hide
            thomas_herrlin Thomas Herrlin added a comment -

            Looks like I may be chasing a red herring

            The slowdowns I see may not actually be related to this ticket, even if I get high cpu usage in fill_buffer -> socketRead0.

            Show
            thomas_herrlin Thomas Herrlin added a comment - Looks like I may be chasing a red herring The slowdowns I see may not actually be related to this ticket, even if I get high cpu usage in fill_buffer -> socketRead0.
            Hide
            joshk0 Joshua K added a comment -

            I strace'd and it looks like a lot of the traffic is due to repeatedly sending exceptions over the remoting channel:

            write(8, "q\0~\0\5\0\2\16`\0\0\0\0sr\0 java.lang.ClassNotFoundException\177Z\315f>\324 \216\2\0\1L\0\2exq\0~\0\1xq\0~\0\10pt\0\23java.nio.file.Filesuq\0~\0\r\0\0\0\21sq\0~\0\17\0\0\5_t\0\33jenkins.util.AntClassLoadert\0\23AntClassLoader.javat\0\25findClassInComponentssq\0~\0\17\0\0\5-q\0~\0004q\0~\0005t\0\tfindClasssq\0~\0\17", 233) = 233

            This is plausible, as our slaves/master run on Java 1.6 and java.nio.file.Files is new in 1.7.

            I will try to bump our slaves to use JDK 1.7 and see if that helps.

            Show
            joshk0 Joshua K added a comment - I strace'd and it looks like a lot of the traffic is due to repeatedly sending exceptions over the remoting channel: write(8, "q\0~\0\5\0\2\16`\0\0\0\0sr\0 java.lang.ClassNotFoundException\177Z\315f>\324 \216\2\0\1L\0\2exq\0~\0\1xq\0~\0\10pt\0\23java.nio.file.Filesuq\0~\0\r\0\0\0\21sq\0~\0\17\0\0\5_t\0\33jenkins.util.AntClassLoadert\0\23AntClassLoader.javat\0\25findClassInComponentssq\0~\0\17\0\0\5-q\0~\0004q\0~\0005t\0\tfindClasssq\0~\0\17", 233) = 233 This is plausible, as our slaves/master run on Java 1.6 and java.nio.file.Files is new in 1.7. I will try to bump our slaves to use JDK 1.7 and see if that helps.
            Hide
            ymartin1040 Yves Martin added a comment -

            After upgrade to 1.625.1, SSH slave communication is as efficient as WebStart agent.
            As a result, I consider this issue as fixed.
            Thank you for the job. Yves

            Show
            ymartin1040 Yves Martin added a comment - After upgrade to 1.625.1, SSH slave communication is as efficient as WebStart agent. As a result, I consider this issue as fixed. Thank you for the job. Yves
            Hide
            hellspam Roy Arnon added a comment -

            Hi,
            Using jenkins 1.625.3 and ssh-slave plugin 1.10, it seems this issue occurs intermenitally for us. Lately, it happens at least once a day.
            Most of the threads are wasting CPU time exactly the same as in the example above.

            I've attached call tree from yourkit.

            I can provide more data if required.

            Show
            hellspam Roy Arnon added a comment - Hi, Using jenkins 1.625.3 and ssh-slave plugin 1.10, it seems this issue occurs intermenitally for us. Lately, it happens at least once a day. Most of the threads are wasting CPU time exactly the same as in the example above. I've attached call tree from yourkit. I can provide more data if required.
            Hide
            hellspam Roy Arnon added a comment -

            Sorry, attached the image incorrectly:

            Show
            hellspam Roy Arnon added a comment - Sorry, attached the image incorrectly:

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                ymartin1040 Yves Martin
              • Votes:
                3 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: