Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30587

All agents get terminated without reconnection possibility.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: core, remoting
    • Labels:
      None
    • Environment:
      Windows 7 64bit.
      Java JRE 1.8.0_60 64bit
      Jenkins 1.629
    • Similar Issues:

      Description

      Almost on daily basis my Jenkins is shutting down is taking ALL agents offline. The reasons for this is unknown to me and looks like a severe bug. Can you please help to check this?

      Based on my observation I notice that connecting new agents seems to fail with an SSL exception.


      Sep 22, 2015 8:08:42 AM org.eclipse.jetty.util.log.JavaUtilLog warn
      WARNING:
      java.nio.channels.ClosedChannelException
      at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
      at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
      at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
      at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:337)
      at org.eclipse.jetty.io.nio.SslConnection.access$900(SslConnection.java:48)
      at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.flush(SslConnection.java:738)
      at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.shutdownOutput(SslConnection.java:641)
      at org.eclipse.jetty.io.nio.SslConnection.onIdleExpired(SslConnection.java:260)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.onIdleExpired(SelectChannelEndPoint.java:349)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:326)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      Sep 22, 2015 8:08:48 AM org.eclipse.jetty.util.log.JavaUtilLog warn
      WARNING: handle failed
      java.lang.IllegalStateException: Internal error
      at sun.security.ssl.SSLEngineImpl.initHandshaker(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.readRecord(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.readNetRecord(Unknown Source)
      at sun.security.ssl.SSLEngineImpl.unwrap(Unknown Source)
      at javax.net.ssl.SSLEngine.unwrap(Unknown Source)
      at org.eclipse.jetty.io.nio.SslConnection.unwrap(SslConnection.java:536)
      at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:401)
      at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:193)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)


      Shortly afterwards I can see that Jenkins is taking ALL agents offline


      Sep 22, 2015 8:20:54 AM hudson.slaves.ChannelPinger$1 onDead
      INFO: Ping failed. Terminating the channel SLAVE-101051.
      java.util.concurrent.TimeoutException: Ping started at 1442902614156 hasn't completed by 1442902854206
      at hudson.remoting.PingThread.ping(PingThread.java:126)
      at hudson.remoting.PingThread.run(PingThread.java:85)


      Afterwards ALL agents want to register back to Jenkins but Jenkins is rejecting it with


      INFO: Accepted connection #288 from /10.0.209.109:64213
      Sep 22, 2015 8:47:00 AM jenkins.slaves.JnlpSlaveHandshake error
      WARNING: TCP slave agent connection handler #288 with /10.0.209.109:64213 is aborted: SLAVE-719161 is already connected to this master. Rejecting this connection.
      Sep 22, 2015 8:47:00 AM hudson.TcpSlaveAgentListener$ConnectionHandler run


      If Jenkins kicks out all agents, I would expect Jenkins to allow it get automatically accepted again instead of referring to already existing connection. But that all agents are being taken offline at once due to PING FAIL is rather a bug.

      Please find full logs attached as well!

        Attachments

          Activity

          Hide
          maedula Hans Baer added a comment -

          One small additon. The web GUI continues to stay responsive. But without restarting Jenkins process, the described issue can not be recovered.

          Show
          maedula Hans Baer added a comment - One small additon. The web GUI continues to stay responsive. But without restarting Jenkins process, the described issue can not be recovered.
          Hide
          maedula Hans Baer added a comment -

          Any update on this here? I have downgraded to an older version of Jenkins (1.624) and the issue is not appearing anymore.

          Show
          maedula Hans Baer added a comment - Any update on this here? I have downgraded to an older version of Jenkins (1.624) and the issue is not appearing anymore.

            People

            • Assignee:
              Unassigned
              Reporter:
              maedula Hans Baer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: