Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-18781

Configurable channel timeout for slaves

    Details

    • Similar Issues:

      Description

      This issue is related to JENKINS-6817.

      I am running Jenkins slaves inside virtual machines. Sometimes these machines are overloaded and I get the following exception:

      FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
      	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
      	at hudson.remoting.Request.call(Request.java:174)
      	at hudson.remoting.Channel.call(Channel.java:713)
      	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
      	at $Proxy38.join(Unknown Source)
      	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
      	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
      	at hudson.tasks.Maven.perform(Maven.java:327)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
      	at hudson.model.Build$BuildExecution.build(Build.java:199)
      	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
      	at hudson.model.Run.execute(Run.java:1593)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:247)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.Request.abort(Request.java:299)
      	at hudson.remoting.Channel.terminate(Channel.java:773)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      Caused by: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
      	at java.io.ObjectInputStream.readObject0(Unknown Source)
      	at java.io.ObjectInputStream.readObject(Unknown Source)
      	at hudson.remoting.Command.readFrom(Command.java:92)
      	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      

      Is it possible to make the channel timeout configurable? I'd like to increase the value from, say 5 seconds, to 30 seconds or a minute.

        Attachments

          Issue Links

            Activity

            Hide
            maq Maciej Kusz added a comment -

            We;ve got similar problem when our master was on VMware. After migration to Hyper-V from Microsoft problem has been solved. I think that this is some problem with VMware configuration or it's network switch virtualization.

            Show
            maq Maciej Kusz added a comment - We;ve got similar problem when our master was on VMware. After migration to Hyper-V from Microsoft problem has been solved. I think that this is some problem with VMware configuration or it's network switch virtualization.
            Hide
            muppy_ Markus Niklasson added a comment - - edited

            Hi,

            We have recently also encountered disconnection issues. Slave is a Windows 7 (x64) PC with enough of RAM and CPU to run heavy applications. The Jenkins master is a Enterprise Redhat 7 (3.10.0-327.18.2.el7.x86_64) running Jenkins 2.23 also with enough memory and so on to run Jenkins. Both running Java 8 update 102. The slave are connected through JNLP. Network can be a bit unstable at times.

            The following intermittent error occurs very frequently during builds:

            Agent went offline during the build
            ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@69c08f2a[name=Buildserver]
            at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
            at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            at java.util.concurrent.FutureTask.run(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at java.lang.Thread.run(Unknown Source)
            Caused by: java.io.IOException: Connection timed out
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(Unknown Source)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
            at sun.nio.ch.IOUtil.read(Unknown Source)
            at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
            at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
            at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
            at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
            ... 6 more

            I have unticked "Response Time" from "Preventive Node Monitoring" and Slaves has -Dhudson.slaves.ChannelPinger.pingInterval=1 set.

            Any other workaround available?

            Show
            muppy_ Markus Niklasson added a comment - - edited Hi, We have recently also encountered disconnection issues. Slave is a Windows 7 (x64) PC with enough of RAM and CPU to run heavy applications. The Jenkins master is a Enterprise Redhat 7 (3.10.0-327.18.2.el7.x86_64) running Jenkins 2.23 also with enough memory and so on to run Jenkins. Both running Java 8 update 102. The slave are connected through JNLP. Network can be a bit unstable at times. The following intermittent error occurs very frequently during builds: Agent went offline during the build ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@69c08f2a [name=Buildserver] at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) ... 6 more I have unticked "Response Time" from "Preventive Node Monitoring" and Slaves has -Dhudson.slaves.ChannelPinger.pingInterval=1 set. Any other workaround available?
            Hide
            externl Joe George added a comment - - edited

            According to the documentation (https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread) -Dhudson.slaves.ChannelPinger.pingInterval=1 should be set on Master. You should also try setting -Dhudson.remoting.Launcher.pingIntervalSec=-1 on the Slave.

            I haven't experience any issues since disabling pinging this way. Next is to start testing different timeout values.

            Show
            externl Joe George added a comment - - edited According to the documentation ( https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread ) -Dhudson.slaves.ChannelPinger.pingInterval=1 should be set on Master . You should also try setting -Dhudson.remoting.Launcher.pingIntervalSec=-1 on the Slave. I haven't experience any issues since disabling pinging this way. Next is to start testing different timeout values.
            Hide
            muppy_ Markus Niklasson added a comment -

            Thanks for the tip!

            By disabling the ping completely it made it more stable. However, I still experience intermittent connectivity problems. During an execution, the slave computer went offline for a couple of seconds and then reconnects to Jenkins Master as seen in the system log:

            Accepted connection #7 from /10.31.43.49:52692

            Sep 21, 2016 8:14:21 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver handle

            Disconnecting Buildserver as we are reconnected from the current peer

            Sep 21, 2016 8:29:49 AM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

            Communication problem
            java.io.IOException: Connection timed out
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(Unknown Source)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
            at sun.nio.ch.IOUtil.read(Unknown Source)
            at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
            at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
            at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
            at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            at java.util.concurrent.FutureTask.run(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at java.lang.Thread.run(Unknown Source)

            Sep 21, 2016 8:29:49 AM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

            NioChannelHub keys=3 gen=842933: Computer.threadPoolForRemoting 2 for Buildserver terminated
            java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@28b1969e[name=Buildserver]
            at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
            at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            at java.util.concurrent.FutureTask.run(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at java.lang.Thread.run(Unknown Source)
            Caused by: java.io.IOException: Connection timed out
            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(Unknown Source)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
            at sun.nio.ch.IOUtil.read(Unknown Source)
            at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
            at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
            at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
            at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
            ... 6 more

            Any ideas how I can prevent the Master from disconnecting the slave (use the reconnected session instead)?

            Show
            muppy_ Markus Niklasson added a comment - Thanks for the tip! By disabling the ping completely it made it more stable. However, I still experience intermittent connectivity problems. During an execution, the slave computer went offline for a couple of seconds and then reconnects to Jenkins Master as seen in the system log: — Accepted connection #7 from /10.31.43.49:52692 Sep 21, 2016 8:14:21 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver handle Disconnecting Buildserver as we are reconnected from the current peer Sep 21, 2016 8:29:49 AM WARNING org.jenkinsci.remoting.nio.NioChannelHub run Communication problem java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Sep 21, 2016 8:29:49 AM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed NioChannelHub keys=3 gen=842933: Computer.threadPoolForRemoting 2 for Buildserver terminated java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@28b1969e [name=Buildserver] at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) ... 6 more Any ideas how I can prevent the Master from disconnecting the slave (use the reconnected session instead)?
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            JENKINS-44785 likely addresses this issue in general. There is a pull request to remoting: https://github.com/jenkinsci/remoting/pull/174 , but I have never finished it due to the review feedback.

             

            I will remove the assignee from the ticket for now, see https://groups.google.com/d/msg/jenkinsci-dev/uc6NsMoCFQI/AIO4WG1UCwAJ for the context

            Show
            oleg_nenashev Oleg Nenashev added a comment - JENKINS-44785 likely addresses this issue in general. There is a pull request to remoting: https://github.com/jenkinsci/remoting/pull/174 , but I have never finished it due to the review feedback.   I will remove the assignee from the ticket for now, see https://groups.google.com/d/msg/jenkinsci-dev/uc6NsMoCFQI/AIO4WG1UCwAJ for the context

              People

              • Assignee:
                Unassigned
                Reporter:
                cowwoc cowwoc
              • Votes:
                58 Vote for this issue
                Watchers:
                69 Start watching this issue

                Dates

                • Created:
                  Updated: