Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30587

All agents get terminated without reconnection possibility.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • core, remoting
    • None
    • Windows 7 64bit.
      Java JRE 1.8.0_60 64bit
      Jenkins 1.629

      Almost on daily basis my Jenkins is shutting down is taking ALL agents offline. The reasons for this is unknown to me and looks like a severe bug. Can you please help to check this?

      Based on my observation I notice that connecting new agents seems to fail with an SSL exception.


      Sep 22, 2015 8:08:42 AM org.eclipse.jetty.util.log.JavaUtilLog warn
      WARNING:
      java.nio.channels.ClosedChannelException
      at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
      at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
      at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
      at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:337)
      at org.eclipse.jetty.io.nio.SslConnection.access$900(SslConnection.java:48)
      at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.flush(SslConnection.java:738)
      at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.shutdownOutput(SslConnection.java:641)
      at org.eclipse.jetty.io.nio.SslConnection.onIdleExpired(SslConnection.java:260)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.onIdleExpired(SelectChannelEndPoint.java:349)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:326)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      Sep 22, 2015 8:08:48 AM org.eclipse.jetty.util.log.JavaUtilLog warn
      WARNING: handle failed
      java.lang.IllegalStateException: Internal error
      at sun.security.ssl.SSLEngineImpl.initHandshaker(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.readRecord(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.readNetRecord(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.unwrap(Unknown Source)
      at javax.net.ssl.SSLEngine.unwrap(Unknown Source)
      at org.eclipse.jetty.io.nio.SslConnection.unwrap(SslConnection.java:536)
      at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:401)
      at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:193)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)


      Shortly afterwards I can see that Jenkins is taking ALL agents offline


      Sep 22, 2015 8:20:54 AM hudson.slaves.ChannelPinger$1 onDead
      INFO: Ping failed. Terminating the channel SLAVE-101051.
      java.util.concurrent.TimeoutException: Ping started at 1442902614156 hasn't completed by 1442902854206
      at hudson.remoting.PingThread.ping(PingThread.java:126)
      at hudson.remoting.PingThread.run(PingThread.java:85)


      Afterwards ALL agents want to register back to Jenkins but Jenkins is rejecting it with


      INFO: Accepted connection #288 from /10.0.209.109:64213
      Sep 22, 2015 8:47:00 AM jenkins.slaves.JnlpSlaveHandshake error
      WARNING: TCP slave agent connection handler #288 with /10.0.209.109:64213 is aborted: SLAVE-719161 is already connected to this master. Rejecting this connection.
      Sep 22, 2015 8:47:00 AM hudson.TcpSlaveAgentListener$ConnectionHandler run


      If Jenkins kicks out all agents, I would expect Jenkins to allow it get automatically accepted again instead of referring to already existing connection. But that all agents are being taken offline at once due to PING FAIL is rather a bug.

      Please find full logs attached as well!

            Unassigned Unassigned
            maedula Hans Baer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: