Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: cli
    • Labels:
      None
    • Similar Issues:

      Description

      There's a report from users that indicates a CLI client can hang at the following spot:

      java.lang.Thread.State: RUNNABLE 
      at java.net.SocketInputStream.socketRead0(Native Method) 
      at java.net.SocketInputStream.read(Unknown Source) 
      at java.io.FilterInputStream.read(Unknown Source) 
      at java.io.FilterInputStream.read(Unknown Source) 
      at javax.crypto.CipherInputStream.a(DashoA13*..) 
      at javax.crypto.CipherInputStream.read(DashoA13*..) 
      at java.io.DataInputStream.readFully(Unknown Source) 
      at java.io.DataInputStream.readFully(Unknown Source) 
      at hudson.cli.Connection.readByteArray(Connection.java:132) 
      at hudson.cli.CLI.connectViaCliPort(CLI.java:243) 
      at hudson.cli.CLI.<init>(CLI.java:134) 
      at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72) 
      at hudson.cli.CLI._main(CLI.java:469) 
      at hudson.cli.CLI.main(CLI.java:384)
      

      At the same time the server has already progressed to the following state:

         java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:150)
          at java.net.SocketInputStream.read(SocketInputStream.java:121)
          at java.io.FilterInputStream.read(FilterInputStream.java:133)
          at java.io.FilterInputStream.read(FilterInputStream.java:107)
          at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:103)
          at javax.crypto.CipherInputStream.read(CipherInputStream.java:224)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
          - locked <0x000000041fa9e560> (a java.io.BufferedInputStream)
          at hudson.remoting.ClassicCommandTransport.create(ClassicCommandTransport.java:98)
          at hudson.remoting.Channel.<init>(Channel.java:392)
          at hudson.remoting.Channel.<init>(Channel.java:388)
          at hudson.cli.CliProtocol$Handler.runCli(CliProtocol.java:48)
          at hudson.cli.CliProtocol2$Handler2.run(CliProtocol2.java:73)
          at hudson.cli.CliProtocol2.handle(CliProtocol2.java:32)
          at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:150)
      

      The issue appears to be that the server thinks it has sent the server's identity (CliProtocol2.Handler2.run() line 62) but the client is still waiting for it. The problem is reported as always sporadically reproducible.

      I'm failing to reproduce this problem locally. If other people see this problem, please report that here.

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/hudson/cli/CliProtocol2.java
            http://jenkins-ci.org/commit/jenkins/2fc99b337af5774c4028e0735de309cfaca78c0a
            Log:
            suspecting bytes not getting flushed see JENKINS-20709

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/hudson/cli/CliProtocol2.java http://jenkins-ci.org/commit/jenkins/2fc99b337af5774c4028e0735de309cfaca78c0a Log: suspecting bytes not getting flushed see JENKINS-20709
            Hide
            oldelvet Richard Mortimer added a comment -

            Possibly related to JENKINS-18058

            That has the opposite problem of spurious output corrupting the stream.

            Show
            oldelvet Richard Mortimer added a comment - Possibly related to JENKINS-18058 That has the opposite problem of spurious output corrupting the stream.
            Hide
            marcomiller Marco Miller added a comment -

            Kohsuke,
            I also work on that issue of ours (Ericsson).
            -Thx for your help so far btw =)
            Since we disabled these pingers -below, it seems like the hanging no longer happens..
            Of course I cannot guarantee that 100%, but I wanted you to be aware of that info:

            -Dhudson.remoting.Launcher.pingIntervalSec=0
            -Dhudson.remoting.Launcher.pingTimeoutSec=0
            -Dhudson.slaves.ChannelPinger.pingInterval=0

            Now, we didn't try to disable only either one of those 2 pingers -yet.
            -All we did was to disable both of them -so far.
            Your thoughts on this hypothesis are welcome! =)
            Thx+ again!!
            PS: should pinger-disabling be a "solution", what could be likely consequences of doing so in prod? -we wonder..
            PPS: another hypo: master waiting forever for slave to send response, while master's channel is artificially kept alive/open -thx to pinger(s).

            Show
            marcomiller Marco Miller added a comment - Kohsuke, I also work on that issue of ours (Ericsson). -Thx for your help so far btw =) Since we disabled these pingers -below, it seems like the hanging no longer happens.. Of course I cannot guarantee that 100%, but I wanted you to be aware of that info: -Dhudson.remoting.Launcher.pingIntervalSec=0 -Dhudson.remoting.Launcher.pingTimeoutSec=0 -Dhudson.slaves.ChannelPinger.pingInterval=0 Now, we didn't try to disable only either one of those 2 pingers -yet. -All we did was to disable both of them -so far. Your thoughts on this hypothesis are welcome! =) Thx+ again!! PS: should pinger-disabling be a "solution", what could be likely consequences of doing so in prod? -we wonder.. PPS: another hypo: master waiting forever for slave to send response, while master's channel is artificially kept alive/open -thx to pinger(s).
            Hide
            marcomiller Marco Miller added a comment -

            (Please note that the above pingers-disabling hypothesis was inconclusive as we were trying it out.)

            Show
            marcomiller Marco Miller added a comment - (Please note that the above pingers-disabling hypothesis was inconclusive as we were trying it out.)
            Hide
            vanharen Jeremy Van Haren added a comment - - edited

            We are still seeing this issue where cli commands can hang. We are getting this on 1.580.2. We kick off about 100-200 cli commands nightly, and we'll get a hang on one of them about 1-2 times a week. So, 1/1000 times approximately.

            Show
            vanharen Jeremy Van Haren added a comment - - edited We are still seeing this issue where cli commands can hang. We are getting this on 1.580.2. We kick off about 100-200 cli commands nightly, and we'll get a hang on one of them about 1-2 times a week. So, 1/1000 times approximately.

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                kohsuke Kohsuke Kawaguchi
              • Votes:
                9 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: