Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12037

CLI - I/O error in channel Chunked connection/Unexpected termination of the channel - still occurring in Jenkins 1.449

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: cli
    • Labels:
      None
    • Environment:
      * Running on SLES9 Linux server with 4 CPUs and plenty of diskspace.
      * Tomcat 7.0.22
      * JDK 1.6.0_14
      * Only ONE Master configuration - no slaves are configured
      * 3 Executors - (one less than the max number of CPUs)

      Description

      We reported an issue some time back that was also listed as fixed in Jenkins 1.441:
      Log:
      [FIXED JENKINS-11130] SEVERE: I/O error in channel Chunked connection when using jenkins-cli.jar

      Perhaps another bug should NOT be submitted so I have added the following comments below the line to the original defect 11130 comments section in case it can be reviewed/re-opened.

      We did NOT try to make any adjustments to the Tomcat configuration:

      Tomcat Connector connectionUploadTimeout

      but we are also now seeing the same problem with Winstone when at this same 1.441 level. We did revert to the 1.438 version of the CLI (leaving the WAR at 1.441 running in Winstone) and that is serving asthe current workaround.

      ================================================================================================

      We have downloaded and installed the LATEST 1.441 release that lists the fix for this problem. Currently we were running 1.438 on Winstone only (since with Tomcat 6 or 7, we had experienced the error HOWEVER yet under Winstone, it worked OK so that was our workaround - while running 1.438).

      Now with Jenkins 1.441 - we are getting the ERROR again and NOW WITH BOTH Winstone and the Tomcat configurations). We have left the Jenkins 1.441 WAR file in place running on Winstone, and reverted the CLI jar file back to the 1.438 version for now and that appears to work again with Winstone.

      Checked Manifest of CLI jar downloaded with the 1.441 WAR installation:

      Manifest-Version: 1.0
      Archiver-Version: Plexus Archiver
      Created-By: Apache Maven
      Built-By: kohsuke
      Build-Jdk: 1.6.0_26
      Main-Class: hudson.cli.CLI
      Jenkins-CLI-Version: 1.441

      Under Tomcat 7, we get this stacktrace:

      Started by command line
      [workspace] $ /bin/bash -xe /opt/apache-tomcat-7.0.22_jenkins/temp/hudson32817888834817830.sh
      + /opt/Sun/jdk1.6.0_14/bin/java -jar /opt/Sun/jdk1.6.0_14/lib/jenkins-cli.jar -s http://11.22.33.44:8082/jenkins/ build XYZ_Project-SharedLibs -s -p SVN_PATH=trunk
      Dec 5, 2011 12:59:11 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel Chunked connection to http://11.22.33.44:8082/jenkins/cli
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy2.main(Unknown Source)
      at hudson.cli.CLI.execute(CLI.java:200)
      at hudson.cli.CLI._main(CLI.java:330)
      at hudson.cli.CLI.main(CLI.java:245)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Build step 'Execute shell' marked build as failure
      Notifying upstream projects of job completion
      Finished: FAILURE

      Under Winstone, we get this stacktrace - it's somewhat different:

      Started by command line
      [workspace] $ /bin/bash -xe /tmp/hudson10791816374444704.sh
      + /opt/Sun/jdk1.6.0_14/bin/java -jar /opt/Sun/jdk1.6.0_14/lib/jenkins-cli.jar -s http://11.22.33.44:8082/jenkins/ build XYZ_Project-SharedLibs -s -p SVN_PATH=trunk
      Dec 5, 2011 1:18:22 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel Chunked connection to http://11.22.33.44:8082/jenkins/cli
      java.io.IOException: Premature EOF
      at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) (http://www.http.ChunkedInputStream.readAheadBlocking%28ChunkedInputStream.java:538%29)
      at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) (http://www.http.ChunkedInputStream.readAhead%28ChunkedInputStream.java:582%29)
      at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) (http://www.http.ChunkedInputStream.read%28ChunkedInputStream.java:669%29)
      at java.io.FilterInputStream.read(FilterInputStream.java:116)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2504) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2504%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2499) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2499%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2488) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2488%29)
      at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
      at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy2.main(Unknown Source)
      at hudson.cli.CLI.execute(CLI.java:200)
      at hudson.cli.CLI._main(CLI.java:330)
      at hudson.cli.CLI.main(CLI.java:245)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Premature EOF
      at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) (http://www.http.ChunkedInputStream.readAheadBlocking%28ChunkedInputStream.java:538%29)
      at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) (http://www.http.ChunkedInputStream.readAhead%28ChunkedInputStream.java:582%29)
      at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) (http://www.http.ChunkedInputStream.read%28ChunkedInputStream.java:669%29)
      at java.io.FilterInputStream.read(FilterInputStream.java:116)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2504) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2504%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2499) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2499%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2488) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2488%29)
      at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
      at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Build step 'Execute shell' marked build as failure
      Notifying upstream projects of job completion
      Finished: FAILURE

      1. jenkins-timeout
        39 kB
        George Panayotov

        Activity

        Hide
        oldelvet Richard Mortimer added a comment -

        For completeness can you get the associated logs from the jenkins server side. That should confirm that the server side is closing the downstream socket because it has detected inactivity/timeout on the upstream side.

        How long does the cli command take to fail? A rough (to the nearest second or so) timing is all that is needed. The issue was due to webserver connection timeout before so it is likely to be something like 60, 30, 20 or 15 seconds.

        Are you able to get a packet trace of the network traffic between cli and jenkins? capturing from the cli side with wireshark or tcpdump should be fine. If you are able to get that and can share it then drop me an email at oldelvet@java.net so that we can arrange to transfer it in private.

        I don't have a lot of time at the moment so if anyone else wants to take a look instead of me please feel free to do so!

        Show
        oldelvet Richard Mortimer added a comment - For completeness can you get the associated logs from the jenkins server side. That should confirm that the server side is closing the downstream socket because it has detected inactivity/timeout on the upstream side. How long does the cli command take to fail? A rough (to the nearest second or so) timing is all that is needed. The issue was due to webserver connection timeout before so it is likely to be something like 60, 30, 20 or 15 seconds. Are you able to get a packet trace of the network traffic between cli and jenkins? capturing from the cli side with wireshark or tcpdump should be fine. If you are able to get that and can share it then drop me an email at oldelvet@java.net so that we can arrange to transfer it in private. I don't have a lot of time at the moment so if anyone else wants to take a look instead of me please feel free to do so!
        Hide
        mcs13099 mark streit added a comment -

        I will get logs from the Tomcat server. It also failed with Winstone this time whereas before it worked with Winstone when using the 1.441 CLI. I will check on those logs assuming they are somewhere in the location where we put the WAR. Failure hit within 10 sec each time.

        Thanks

        Show
        mcs13099 mark streit added a comment - I will get logs from the Tomcat server. It also failed with Winstone this time whereas before it worked with Winstone when using the 1.441 CLI. I will check on those logs assuming they are somewhere in the location where we put the WAR. Failure hit within 10 sec each time. Thanks
        Hide
        mcs13099 mark streit added a comment -

        Have attached logs from the Tomcat server where we are running this. We have downloaded the last version 1.449 WAR and the corresponding CLI jar for 1.449. We repeated the test and are seeing this I/O error in chunked connection again. (right now only 1.441 running in Winstone will work without this error).

        Console output from the Jenkins web page is also included as is the Shell command text that occupies the Execute Shell section of the build job configuration page.

        This all works with 1.441 as long as the container is Winstone.

        Show
        mcs13099 mark streit added a comment - Have attached logs from the Tomcat server where we are running this. We have downloaded the last version 1.449 WAR and the corresponding CLI jar for 1.449. We repeated the test and are seeing this I/O error in chunked connection again. (right now only 1.441 running in Winstone will work without this error). Console output from the Jenkins web page is also included as is the Shell command text that occupies the Execute Shell section of the build job configuration page. This all works with 1.441 as long as the container is Winstone.
        Hide
        oldelvet Richard Mortimer added a comment -

        I'm still max'd out with $dayjob and have not had time to look in much detail at this.

        I just did a quick test on my test system with 1.451 using tomcat 6 and do not see any timeouts there with the tomcat http data upload timeout set to 20 seconds. That rules out any basic brokenness (I hope!).

        Looking at your log files (see below) there is one instance where an error occurs 12 seconds after the previous timeout. That suggests that tomcat 7 is timing out in around 10 seconds. The jenkins channel keepalive ping is set for 15 seconds so it could be that the ping is still not aggressive enough.

        We probably just need to test with a 5 seconds ping time and see if that fixes things.

        First error reported in ConsoleOutput.log

        Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel Chunked connection to http://99.99.99.94:8082/jenkins/cli
        java.io.IOException: Unexpected termination of the channel
        	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
        Caused by: java.io.EOFException
        

        This corresponds to the following two connections in localhost_access_log.2012-02-06.txt

        10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935
        10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
        

        The former is the downstream link and the latter is the upstream (sending from cli to jenkins) hence there being no output in that direction.

        catalina.out (equivalent to standalone jenkins.log) has the other side of the error
        This confirms that the upstream reader sees the input data closed by tomcat.

        Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel HTTP full-duplex channel b625b58a-d9d7-4a42-856c-287e953edb47
        java.net.SocketTimeoutException: Read timed out
        	at java.net.SocketInputStream.socketRead0(Native Method)
        

        The next error in the console output occurs 12 seconds later.

        Feb 6, 2012 2:48:00 PM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel Chunked connection to http://99.99.99.94:8082/jenkins/cli
        java.io.IOException: Unexpected termination of the channel
        	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
        
        	
        10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935
        10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
        

        The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command.

        10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156
        10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
        
        10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156
        10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
        

        I wonder if the http data upload inactivity timeout is 10 seconds in tomcat7.

        Show
        oldelvet Richard Mortimer added a comment - I'm still max'd out with $dayjob and have not had time to look in much detail at this. I just did a quick test on my test system with 1.451 using tomcat 6 and do not see any timeouts there with the tomcat http data upload timeout set to 20 seconds. That rules out any basic brokenness (I hope!). Looking at your log files (see below) there is one instance where an error occurs 12 seconds after the previous timeout. That suggests that tomcat 7 is timing out in around 10 seconds. The jenkins channel keepalive ping is set for 15 seconds so it could be that the ping is still not aggressive enough. We probably just need to test with a 5 seconds ping time and see if that fixes things. First error reported in ConsoleOutput.log Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http: //99.99.99.94:8082/jenkins/cli java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) Caused by: java.io.EOFException This corresponds to the following two connections in localhost_access_log.2012-02-06.txt 10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935 10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 - The former is the downstream link and the latter is the upstream (sending from cli to jenkins) hence there being no output in that direction. catalina.out (equivalent to standalone jenkins.log) has the other side of the error This confirms that the upstream reader sees the input data closed by tomcat. Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel HTTP full-duplex channel b625b58a-d9d7-4a42-856c-287e953edb47 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) The next error in the console output occurs 12 seconds later. Feb 6, 2012 2:48:00 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http: //99.99.99.94:8082/jenkins/cli java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) 10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935 10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 - The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command. 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 - 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 - I wonder if the http data upload inactivity timeout is 10 seconds in tomcat7.
        Hide
        oldelvet Richard Mortimer added a comment - - edited

        Oops comment added to wrong issue!

        Show
        oldelvet Richard Mortimer added a comment - - edited Oops comment added to wrong issue!
        Hide
        mcs13099 mark streit added a comment -

        with regard to your question about the cli commands showing in the logs -

        "The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command."

        This test was only done with no other jobs being run. Only the command shell being run that is running the jobs with the -s parameter. It "appears" to us that the -s parameter gets ignored and other jobs start running before we expect them to.

        Show
        mcs13099 mark streit added a comment - with regard to your question about the cli commands showing in the logs - "The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command." This test was only done with no other jobs being run. Only the command shell being run that is running the jobs with the -s parameter. It "appears" to us that the -s parameter gets ignored and other jobs start running before we expect them to.
        Hide
        oldelvet Richard Mortimer added a comment -

        If I build a test version for you with a 5 second heartbeat timeout are you able to test it? If so is there any particular Jenkins version that you would be testing with so that I can build a matching cli jar file.

        Regarding the -s parameter being ignored I vaguely remember that there is a bug reported in JIRA where it does not wait if there is already a queued instance of the same job. From what you describe this may not be applicable to your circumstances but it could.

        Show
        oldelvet Richard Mortimer added a comment - If I build a test version for you with a 5 second heartbeat timeout are you able to test it? If so is there any particular Jenkins version that you would be testing with so that I can build a matching cli jar file. Regarding the -s parameter being ignored I vaguely remember that there is a bug reported in JIRA where it does not wait if there is already a queued instance of the same job. From what you describe this may not be applicable to your circumstances but it could.
        Hide
        brianharris brianharris added a comment -

        Stack trace looks much like the popular JENKINS-6817.

        Show
        brianharris brianharris added a comment - Stack trace looks much like the popular JENKINS-6817 .
        Hide
        mcs130 mcs130 added a comment -

        Richard we obtained your 1.450 version of the JAR with heartbeat time interval change. We installed the 1.450 WAR into Tomcat 7.0.22 and run everything from jdk1.6.0_30 now - both the Tomcat startup and the JDK from which we run the CLI commands. We even set the connectionTimeout in the Tomcat server.xml from 20 seconds to as high as 60 seconds. Nothing is working - we still get the I/O chunked connection/Unexpected term of channel error. For now we have had to revert back to 1.441 running on Winstone only...and the original 1.441 CLI jar that came with it. When using this combination - it works but is not ideal. We can't get above 1.441 because even using Winstone with a later version fails in the same manner. And to your point, the -s parameter appears to get ignored when used. I will get the similar logs from the last experiment and send them as before.

        Thanks

        Mark

        Show
        mcs130 mcs130 added a comment - Richard we obtained your 1.450 version of the JAR with heartbeat time interval change. We installed the 1.450 WAR into Tomcat 7.0.22 and run everything from jdk1.6.0_30 now - both the Tomcat startup and the JDK from which we run the CLI commands. We even set the connectionTimeout in the Tomcat server.xml from 20 seconds to as high as 60 seconds. Nothing is working - we still get the I/O chunked connection/Unexpected term of channel error. For now we have had to revert back to 1.441 running on Winstone only...and the original 1.441 CLI jar that came with it. When using this combination - it works but is not ideal. We can't get above 1.441 because even using Winstone with a later version fails in the same manner. And to your point, the -s parameter appears to get ignored when used. I will get the similar logs from the last experiment and send them as before. Thanks Mark
        Hide
        oldelvet Richard Mortimer added a comment -

        Well I think that pretty much rules out the ping interval directly. Obviously there may be something that is blocking the channel/pings. Alternately there may be something external that is breaking the connection. The logs will definitely be useful to confirm that it is still the upstream (CLI to Jenkins) connection that is being closed.

        I'm not sure there is much more I can do without a reproducible test case that I can run here or unless someone who can reproduce can get a network packet dump (that they are willing to share) that will hopefully allow us to piece together what is going on at the network level.

        Show
        oldelvet Richard Mortimer added a comment - Well I think that pretty much rules out the ping interval directly. Obviously there may be something that is blocking the channel/pings. Alternately there may be something external that is breaking the connection. The logs will definitely be useful to confirm that it is still the upstream (CLI to Jenkins) connection that is being closed. I'm not sure there is much more I can do without a reproducible test case that I can run here or unless someone who can reproduce can get a network packet dump (that they are willing to share) that will hopefully allow us to piece together what is going on at the network level.
        Hide
        oldelvet Richard Mortimer added a comment -

        @brianharris agree that the stacktrace looks similar. The one difference that I'm aware of is that the connections in JENKINS-12037 are done over the cli channel using two HTTP connections rather than to the JNLP port. In this case the HTTP container can break connections if it thinks that there is a "bad" connection and we could reproduce this particular failure and trace it back to that.

        Show
        oldelvet Richard Mortimer added a comment - @brianharris agree that the stacktrace looks similar. The one difference that I'm aware of is that the connections in JENKINS-12037 are done over the cli channel using two HTTP connections rather than to the JNLP port. In this case the HTTP container can break connections if it thinks that there is a "bad" connection and we could reproduce this particular failure and trace it back to that.
        Hide
        wild_oscar Miguel Almeida added a comment -

        Hi,

        I've just posted on the mailing list (http://groups.google.com/group/jenkinsci-users/browse_frm/thread/7f96cdf57ffed6cf# ) - perhaps this is the same issue.

        I have a multi-module project where I'm running the Maven goal "clean
        surefire-report:report".
        While the tests pass in our local machines, we're randomly having failures
        in one of the modules in Jenkins. The console shows:

         
        mavenExecutionResult exceptions not empty
        message : Failed to execute goal
        org.apache.maven.plugins:maven-surefire-plugin:2.12:test
        (default-test) on project cm-web: Error occurred in starting fork,
        check output in log
        cause : Error occurred in starting fork, check output in log
        Stack trace :
        org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
        execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test
        (default-test) on project cm-web: Error occurred in starting fork,
        check output in log
        

        We have Jenkins running in Tomcat and by running the build and inspecting
        tomcat's log I see:

         
        SEVERE: I/O error in channel Channel to Maven [java, -cp,
        /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-agent-1.2.jar:/usr/share/tomcat6/.jenkins/tools/Jenkins_maven/boot/plexus-classworlds-2.4.jar,
        org.jvnet.hudson.maven3.agent.Maven3Main,
        /usr/share/tomcat6/.jenkins/tools/Jenkins_maven,
        /usr/share/tomcat6/webapps/jenkins/WEB-INF/lib/remoting-2.12.jar,
        /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-interceptor-1.2.jar,
        40284]
        java.io.IOException: Unexpected termination of the channel
            at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
        Caused by: java.io.EOFException
            at
        java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
            at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) 
        

        Restarting tomcat makes the problem go away (until it "randomly" appears again after a few days)

        Show
        wild_oscar Miguel Almeida added a comment - Hi, I've just posted on the mailing list ( http://groups.google.com/group/jenkinsci-users/browse_frm/thread/7f96cdf57ffed6cf# ) - perhaps this is the same issue. I have a multi-module project where I'm running the Maven goal "clean surefire-report:report". While the tests pass in our local machines, we're randomly having failures in one of the modules in Jenkins. The console shows: mavenExecutionResult exceptions not empty message : Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project cm-web: Error occurred in starting fork, check output in log cause : Error occurred in starting fork, check output in log Stack trace : org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project cm-web: Error occurred in starting fork, check output in log We have Jenkins running in Tomcat and by running the build and inspecting tomcat's log I see: SEVERE: I/O error in channel Channel to Maven [java, -cp, /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-agent-1.2.jar:/usr/share/tomcat6/.jenkins/tools/Jenkins_maven/boot/plexus-classworlds-2.4.jar, org.jvnet.hudson.maven3.agent.Maven3Main, /usr/share/tomcat6/.jenkins/tools/Jenkins_maven, /usr/share/tomcat6/webapps/jenkins/WEB-INF/lib/remoting-2.12.jar, /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-interceptor-1.2.jar, 40284] java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Restarting tomcat makes the problem go away (until it "randomly" appears again after a few days)
        Hide
        gpanayotov George Panayotov added a comment -

        I'm running version 1.454 on OSX Lion via the .pkg install. I believe that means I have it running via WinStone. I have a similar issue to this (the same I think). I call build -s JOB_NAME via cli. The channel times out after 1 minute exactly.

        Server log:

        Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel HTTP full-duplex channel fe769a28-6f68-4595-b207-be5ac016c678
        java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.net.SocketInputStream.read(SocketInputStream.java:182)
        at winstone.WinstoneInputStream.read(WinstoneInputStream.java:49)
        at hudson.util.ChunkedInputStream.getChunkSizeFromInputStream(ChunkedInputStream.java:223)
        at hudson.util.ChunkedInputStream.nextChunk(ChunkedInputStream.java:197)
        at hudson.util.ChunkedInputStream.read(ChunkedInputStream.java:120)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)

        Client Log:

        Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel Chunked connection to http://10.1.11.19:8080/cli
        java.io.IOException: Premature EOF
        at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
        at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
        at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)
        Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
        at hudson.remoting.Request.call(Request.java:149)
        at hudson.remoting.Channel.call(Channel.java:681)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
        at $Proxy2.main(Unknown Source)
        at hudson.cli.CLI.execute(CLI.java:271)
        at hudson.cli.CLI._main(CLI.java:417)
        at hudson.cli.CLI.main(CLI.java:322)
        Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
        at hudson.remoting.Request.abort(Request.java:273)
        at hudson.remoting.Channel.terminate(Channel.java:732)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1157)
        Caused by: java.io.IOException: Premature EOF
        at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
        at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
        at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)

        I also connected WireShark to see if I could get any useful info from there and I noticed one thing among the mess of serialized data:

        "Execution of hudson.remoting.PingThread$Ping@2be548d is prohibited because the channel is restricted"

        I'm pretty sure my issue is that the server times out at 60 seconds and it seems like the line above might mean that the reason is that the timeout is set to 60 seconds and the PingThread is not running. Hope, this helps. Also, has anyone found a workaround for this yet?

        -George

        Show
        gpanayotov George Panayotov added a comment - I'm running version 1.454 on OSX Lion via the .pkg install. I believe that means I have it running via WinStone. I have a similar issue to this (the same I think). I call build -s JOB_NAME via cli. The channel times out after 1 minute exactly. Server log: Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel HTTP full-duplex channel fe769a28-6f68-4595-b207-be5ac016c678 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at winstone.WinstoneInputStream.read(WinstoneInputStream.java:49) at hudson.util.ChunkedInputStream.getChunkSizeFromInputStream(ChunkedInputStream.java:223) at hudson.util.ChunkedInputStream.nextChunk(ChunkedInputStream.java:197) at hudson.util.ChunkedInputStream.read(ChunkedInputStream.java:120) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Client Log: Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http://10.1.11.19:8080/cli java.io.IOException: Premature EOF at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) at java.io.FilterInputStream.read(FilterInputStream.java:116) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF at hudson.remoting.Request.call(Request.java:149) at hudson.remoting.Channel.call(Channel.java:681) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy2.main(Unknown Source) at hudson.cli.CLI.execute(CLI.java:271) at hudson.cli.CLI._main(CLI.java:417) at hudson.cli.CLI.main(CLI.java:322) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF at hudson.remoting.Request.abort(Request.java:273) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1157) Caused by: java.io.IOException: Premature EOF at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) at java.io.FilterInputStream.read(FilterInputStream.java:116) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) I also connected WireShark to see if I could get any useful info from there and I noticed one thing among the mess of serialized data: "Execution of hudson.remoting.PingThread$Ping@2be548d is prohibited because the channel is restricted" I'm pretty sure my issue is that the server times out at 60 seconds and it seems like the line above might mean that the reason is that the timeout is set to 60 seconds and the PingThread is not running. Hope, this helps. Also, has anyone found a workaround for this yet? -George
        Hide
        oldelvet Richard Mortimer added a comment -

        @Miguel Almeida thanks for the information. At the moment that seems sufficiently different to the situation/symptoms reported in this JIRA that I think it would be best if you could report that as a separate JIRA to stop confusion over symptoms. It may turn out to be related once the root cause is known as @brianharris points out.

        Show
        oldelvet Richard Mortimer added a comment - @Miguel Almeida thanks for the information. At the moment that seems sufficiently different to the situation/symptoms reported in this JIRA that I think it would be best if you could report that as a separate JIRA to stop confusion over symptoms. It may turn out to be related once the root cause is known as @brianharris points out.
        Hide
        oldelvet Richard Mortimer added a comment -

        @George Panayotov thanks for the data. This does indeed look like similar symptoms to JENKINS-12037.

        Thank you for taking a look using WireShark that might be a big clue as to what is going on. Looking at the source in FullDuplexHttpChannel.java that sets isRestricted based on user authentication so this could give a clue as to why it might occur for some and not others.

        That said if you still have the Wireshark output around can you take a look to see if you can answer the following:

        • Does the "is prohibited because the channel is restricted" message get repeated or does it just occur once at the start of the CLI session?
        • Do you see any other activity on either of the two http streams (upstream and downstream) during the 60 seconds after the initial activity?

        Richard

        Show
        oldelvet Richard Mortimer added a comment - @George Panayotov thanks for the data. This does indeed look like similar symptoms to JENKINS-12037 . Thank you for taking a look using WireShark that might be a big clue as to what is going on. Looking at the source in FullDuplexHttpChannel.java that sets isRestricted based on user authentication so this could give a clue as to why it might occur for some and not others. That said if you still have the Wireshark output around can you take a look to see if you can answer the following: Does the "is prohibited because the channel is restricted" message get repeated or does it just occur once at the start of the CLI session? Do you see any other activity on either of the two http streams (upstream and downstream) during the 60 seconds after the initial activity? Richard
        Hide
        gpanayotov George Panayotov added a comment -

        This is the Wireshark capture of the failing transaction. Hope this helps.

        Show
        gpanayotov George Panayotov added a comment - This is the Wireshark capture of the failing transaction. Hope this helps.
        Hide
        gpanayotov George Panayotov added a comment -

        I think the "Channel is restricted" only happened once. I don't think there was any other activity during the 60 seconds. You should check jenkins-timeout to see if you can find anything.

        Thanks,
        George

        Show
        gpanayotov George Panayotov added a comment - I think the "Channel is restricted" only happened once. I don't think there was any other activity during the 60 seconds. You should check jenkins-timeout to see if you can find anything. Thanks, George
        Hide
        oldelvet Richard Mortimer added a comment -

        @George Panayotov many thanks that is really good data.

        I've only had time for a very quick look at this but to answer my questions:

        • The ping thread only attempts to ping once at the start of the CLI session.
        • There is no activity on the thread at all after the ping thread fails until the 60 second timeout.

        What authentication schemes do you have enabled on your Jenkins instance? That might be key to why I don't see this issue.

        I'm not going to have time to look at this further this week due to other committments. But its clear that the ping thread is definitely failing in this case so hopefully fixing that will make things work.

        For the record I noticed the following in the packet capture file.

        Stream 1 62335 -> 8080

        • Starts at offset 0.0000
        • Download stream request started at packet 17 0.157159
        • Finish of stacktrace at packet 83 4.177468
        • No activity until 60.751270 when the server side closes the stream

        Stream 2 62336 -> 8080

        • Upload stream starts at packet 21 0.198207
        • Last upload activity at packet 73 0.591000
        • No activity until packet 86 60.751422
        • Winstone cleanly closes the upload stream.
        Show
        oldelvet Richard Mortimer added a comment - @George Panayotov many thanks that is really good data. I've only had time for a very quick look at this but to answer my questions: The ping thread only attempts to ping once at the start of the CLI session. There is no activity on the thread at all after the ping thread fails until the 60 second timeout. What authentication schemes do you have enabled on your Jenkins instance? That might be key to why I don't see this issue. I'm not going to have time to look at this further this week due to other committments. But its clear that the ping thread is definitely failing in this case so hopefully fixing that will make things work. For the record I noticed the following in the packet capture file. Stream 1 62335 -> 8080 Starts at offset 0.0000 Download stream request started at packet 17 0.157159 Finish of stacktrace at packet 83 4.177468 No activity until 60.751270 when the server side closes the stream Stream 2 62336 -> 8080 Upload stream starts at packet 21 0.198207 Last upload activity at packet 73 0.591000 No activity until packet 86 60.751422 Winstone cleanly closes the upload stream.
        Hide
        gpanayotov George Panayotov added a comment -

        I'm using internal Jenkins users with Matrix based security (global, not project-based). I have 2 users one is "superuser" that has all rights and another is a least-privileged user who can basically just create builds and read info. I get this issue with either user. I'm not running through any sort of Apache or Tomcat proxy so I don't have SSL running. I plan to add this eventually but for now I wanted to get this working in the basic case.

        Show
        gpanayotov George Panayotov added a comment - I'm using internal Jenkins users with Matrix based security (global, not project-based). I have 2 users one is "superuser" that has all rights and another is a least-privileged user who can basically just create builds and read info. I get this issue with either user. I'm not running through any sort of Apache or Tomcat proxy so I don't have SSL running. I plan to add this eventually but for now I wanted to get this working in the basic case.
        Hide
        mcs13099 mark streit added a comment -

        Based on the number of comments and information about the ping thread, is it safe to assume that it will still be some time before a fix is available? I understand that folks are buried with other work. We are trying to figure out what to do with a planned Jenkins upgrade in our environment because we are stuck with:

        Jenkins App WAR 1.441
        Jenkins CLI JAR 1.438
        Container: Winstone only / Tomcat 7 will not work

        Any attempt to use a combination involving later versions (and we ultimately know the CLI JAR and WAR versions need to align) does not seem to work - this I/O error in channel chunked connection exception shows up.

        Thanks

        Show
        mcs13099 mark streit added a comment - Based on the number of comments and information about the ping thread, is it safe to assume that it will still be some time before a fix is available? I understand that folks are buried with other work. We are trying to figure out what to do with a planned Jenkins upgrade in our environment because we are stuck with: Jenkins App WAR 1.441 Jenkins CLI JAR 1.438 Container: Winstone only / Tomcat 7 will not work Any attempt to use a combination involving later versions (and we ultimately know the CLI JAR and WAR versions need to align) does not seem to work - this I/O error in channel chunked connection exception shows up. Thanks
        Hide
        oldelvet Richard Mortimer added a comment -

        @mark streit I think that I have enough clues now to say what the root cause is and am pretty sure that I have a workaround (although hack may be a better description).

        Progress since my last report.

        • I can reproduce this issue by forcing a restricted channel in the constructor for FullDuplexHttpChannel.java mentioned in an earlier comment.
        • From this I can see that the PingThread is getting stuck trying (and failing) to f.get but instead getting an ExecutionException.
          • Note I cannot see why the code would never terminate when remaining <= 0 but it does not seem to.
        • I have been able to make the PingThread work by adding an additional return after the onDead check and this makes things work without timeout.
        • It could be argued that the extra return just papers over the problem because whilst the timeout works the PingThread does not actually perform a successful ping because of the restricted channel.

        From the evidence above it seems that the problem was introduced by the fixes made for JENKINS-11097 which were delivered in 1.433.

        The fix that I have only requires changes in the cli-with-dependencies.jar that is used on the client side. If anyone can test in the next 24 hours I can build a version to match your Jenkins version. Please let me know what version is useful.

        Show
        oldelvet Richard Mortimer added a comment - @mark streit I think that I have enough clues now to say what the root cause is and am pretty sure that I have a workaround (although hack may be a better description). Progress since my last report. I can reproduce this issue by forcing a restricted channel in the constructor for FullDuplexHttpChannel.java mentioned in an earlier comment. From this I can see that the PingThread is getting stuck trying (and failing) to f.get but instead getting an ExecutionException . Note I cannot see why the code would never terminate when remaining <= 0 but it does not seem to. I have been able to make the PingThread work by adding an additional return after the onDead check and this makes things work without timeout. It could be argued that the extra return just papers over the problem because whilst the timeout works the PingThread does not actually perform a successful ping because of the restricted channel. From the evidence above it seems that the problem was introduced by the fixes made for JENKINS-11097 which were delivered in 1.433. The fix that I have only requires changes in the cli-with-dependencies.jar that is used on the client side. If anyone can test in the next 24 hours I can build a version to match your Jenkins version. Please let me know what version is useful.
        Hide
        scm_issue_link SCM/JIRA link daemon added a comment -

        Code changed in jenkins
        User: Richard Mortimer
        Path:
        changelog.html
        cli/src/main/java/hudson/cli/CLI.java
        http://jenkins-ci.org/commit/jenkins/ea1b80aebe85a9414d5befd58e976d85818a258d
        Log:
        [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection

        Use a ping timeout of 3/4 of the ping interval instead of the default
        4 minute timeout.
        This happens to workaround an issue in the remoting PingThread
        implementation which causes the ping to busy wait for the timeout
        period instead of keeping the channel alive.

        Show
        scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: changelog.html cli/src/main/java/hudson/cli/CLI.java http://jenkins-ci.org/commit/jenkins/ea1b80aebe85a9414d5befd58e976d85818a258d Log: [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection Use a ping timeout of 3/4 of the ping interval instead of the default 4 minute timeout. This happens to workaround an issue in the remoting PingThread implementation which causes the ping to busy wait for the timeout period instead of keeping the channel alive.
        Hide
        oldelvet Richard Mortimer added a comment -

        I've committed a change to the CLI code that brings the execution timeout in line with the ping period. This has the side effect of fixing the problem in my test environment (however the cli ping thread will busy wait roughly 75% of the time).

        In addition I've submitted pull request https://github.com/jenkinsci/remoting/pull/3 which will get rid of the busy waiting.

        Show
        oldelvet Richard Mortimer added a comment - I've committed a change to the CLI code that brings the execution timeout in line with the ping period. This has the side effect of fixing the problem in my test environment (however the cli ping thread will busy wait roughly 75% of the time). In addition I've submitted pull request https://github.com/jenkinsci/remoting/pull/3 which will get rid of the busy waiting.
        Hide
        dogfood dogfood added a comment -

        Integrated in jenkins_main_trunk #1624
        [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d)

        Result = SUCCESS
        richm : ea1b80aebe85a9414d5befd58e976d85818a258d
        Files :

        • cli/src/main/java/hudson/cli/CLI.java
        • changelog.html
        Show
        dogfood dogfood added a comment - Integrated in jenkins_main_trunk #1624 [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d) Result = SUCCESS richm : ea1b80aebe85a9414d5befd58e976d85818a258d Files : cli/src/main/java/hudson/cli/CLI.java changelog.html
        Hide
        oldelvet Richard Mortimer added a comment -

        A snapshot build of 1.459 with the fix can be downloaded from
        cli-1.459-SNAPSHOT-jar-with-dependencies.jar this should allow testing against recent version of Jenkins.

        Show
        oldelvet Richard Mortimer added a comment - A snapshot build of 1.459 with the fix can be downloaded from cli-1.459-SNAPSHOT-jar-with-dependencies.jar this should allow testing against recent version of Jenkins.
        Hide
        mcs13099 mark streit added a comment -

        Richard

        THANK YOU for this... it appears to be WORKING NOW . We are using your snapshot build below and the GA Jenkins 1.457 version (WAR) and running on Tomcat 7.0.22. (not Winstone anymore). The connection is not dropped and the jobs sequence runs to completion.

        Is it safe to assume the fixed version will be part of the official 1.459 release?

        We really appreciate your time and help with getting this fixed.

        Show
        mcs13099 mark streit added a comment - Richard THANK YOU for this... it appears to be WORKING NOW . We are using your snapshot build below and the GA Jenkins 1.457 version (WAR) and running on Tomcat 7.0.22. (not Winstone anymore). The connection is not dropped and the jobs sequence runs to completion. Is it safe to assume the fixed version will be part of the official 1.459 release? We really appreciate your time and help with getting this fixed.
        Hide
        oldelvet Richard Mortimer added a comment -

        Yes it will be part of 1.459. The snapshot build that you tested is direct from Jenkins on Jenkins builds.

        I mentioned it earlier but the current fix actually busy-waits during the ping cycle so you may see high cpu usage from the cli command. The busy-wait is fixed by the change that I have submitted to the underlying "remoting" library. That might take a release or two to filter through.

        Finally we should really say thank you to George who managed to get a packet trace with the failure so that we could work out what was going on.

        Show
        oldelvet Richard Mortimer added a comment - Yes it will be part of 1.459. The snapshot build that you tested is direct from Jenkins on Jenkins builds. I mentioned it earlier but the current fix actually busy-waits during the ping cycle so you may see high cpu usage from the cli command. The busy-wait is fixed by the change that I have submitted to the underlying "remoting" library. That might take a release or two to filter through. Finally we should really say thank you to George who managed to get a packet trace with the failure so that we could work out what was going on.
        Hide
        dogfood dogfood added a comment -

        Integrated in jenkins_ui-changes_branch #21
        [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d)

        Result = SUCCESS
        richm : ea1b80aebe85a9414d5befd58e976d85818a258d
        Files :

        • cli/src/main/java/hudson/cli/CLI.java
        • changelog.html
        Show
        dogfood dogfood added a comment - Integrated in jenkins_ui-changes_branch #21 [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d) Result = SUCCESS richm : ea1b80aebe85a9414d5befd58e976d85818a258d Files : cli/src/main/java/hudson/cli/CLI.java changelog.html
        Hide
        moshe_sucaz Moshe Sucaz added a comment -

        Hi,
        We have just downloaded Jenkins 1.461 (because we had this problem with 1.443), and I still see such failures in catalina logs. we are using tomcat 6.0.29. will upgrade to Tomcat 7.0.22 help?

        Show
        moshe_sucaz Moshe Sucaz added a comment - Hi, We have just downloaded Jenkins 1.461 (because we had this problem with 1.443), and I still see such failures in catalina logs. we are using tomcat 6.0.29. will upgrade to Tomcat 7.0.22 help?
        Hide
        mcs13099 mark streit added a comment -

        A couple of things with this:

        1) you need to make sure you download and install the latest jenkins-cli.jar that comes with 1.460 or higher version of the WAR download...and have it in your classpath wherever you are utilizing the CLI jar.
        2) It's the CLI jar code that was fixed as part of this. Simply updating the WAR file on your Tomcat instance is only part of the upgrade.
        3) I checked our catalina.out logs on our Tomcat 7.0.22 installation and we don't see these anymore since the fix was delivered in 1.460 which is what we now run. We made sure wherever we use CLI commands that the 1.460 version of the jenkins-cli.jar is on the classpath.

        Hope this helps.

        Show
        mcs13099 mark streit added a comment - A couple of things with this: 1) you need to make sure you download and install the latest jenkins-cli.jar that comes with 1.460 or higher version of the WAR download...and have it in your classpath wherever you are utilizing the CLI jar. 2) It's the CLI jar code that was fixed as part of this. Simply updating the WAR file on your Tomcat instance is only part of the upgrade. 3) I checked our catalina.out logs on our Tomcat 7.0.22 installation and we don't see these anymore since the fix was delivered in 1.460 which is what we now run. We made sure wherever we use CLI commands that the 1.460 version of the jenkins-cli.jar is on the classpath. Hope this helps.
        Hide
        scm_issue_link SCM/JIRA link daemon added a comment -

        Code changed in jenkins
        User: Richard Mortimer
        Path:
        src/main/java/hudson/remoting/PingThread.java
        http://jenkins-ci.org/commit/remoting/463b5427d272f5d0c2c97eb6c41ef95757eb6c15
        Log:
        Do not attempt to retry a channel get after an ExecutionException.
        Without this the pinger busy waits retrying the same operation for the
        timeout period. The timeout/retry mechanism was only intended to retry
        pings that were interrupted.

        This is part of the failure seen in JENKINS-12037 where the PingThread
        busy waits for 4 minutes when the Ping class fails to execute on a restricted
        channel.

        Show
        scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: src/main/java/hudson/remoting/PingThread.java http://jenkins-ci.org/commit/remoting/463b5427d272f5d0c2c97eb6c41ef95757eb6c15 Log: Do not attempt to retry a channel get after an ExecutionException. Without this the pinger busy waits retrying the same operation for the timeout period. The timeout/retry mechanism was only intended to retry pings that were interrupted. This is part of the failure seen in JENKINS-12037 where the PingThread busy waits for 4 minutes when the Ping class fails to execute on a restricted channel.

          People

          • Assignee:
            Unassigned
            Reporter:
            mcs13099 mark streit
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: