Hi,
I've seen this problem for the last 5-6 releases. When I saw that there were a few bugs fixed in 2.09 around slaves, I was hoping this was fixed. But no such luck.
Typically, any time I reboot Jenkins (server), some of the nodes (also windows machines, loaded via command line w/ slave.jar) fail to come back online. They are all online prior to the server reboot.
Killing them and reissuing the command line works... sometimes. Sometimes I need to do this 5-6 times and fiddle with the "mark node offline' button for this to eventually work.
To ensure I was running w/ latest and greatest for this last go around, I:
- Upgraded server to 2.10.
- Rebooted server
- Went to each node, killed the slave process, deleted existing slave.jar, and replaced with the new one that comes with 2.10.
- reissued the command line for the slave process (shown below)
- This seemed to work
- Next, I went to Jenkins server and upgraded a few plugins that needed upgrading, since I had just recently upgraded to 2.10
- On plugin download screen, I also checked "reboot Jenkins if no jobs are pending" (or whatever it says there).
- Jenkins rebooted
- As usual, some of the nodes don't come back online. When I kill the process and reissue the command, it will work.
For one of the ones in this state (not reconnected after rebooting server), this is what I find:
On the "manage node" -> "Log" screen:
JNLP agent connected from /<my IP address>
(that's it, no other messages).
On the command line on that machine, shown below. The beginning of the output is from when I last launched the slave process, which was done after I upgraded to 2.10, (and downloaded the latest slave.jar from the 2.10 server).
PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret>
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: <my node name>
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my IP>/
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52465
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect
Jun 22, 2016 2:05:19 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
<--- things were good here. Now I try to reboot server after installing plugins.
Jun 22, 2016 2:07:33 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel channel
<--- I assume this is where jenkins server rebooted after I told it to, after loading plugins, as described above.
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114)
at javax.crypto.CipherInputStream.read(CipherInputStream.java:192)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTranspor
t.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Jun 22, 2016 2:07:33 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Jun 22, 2016 2:07:44 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:07:54 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:08:04 PM hudson.remoting.Engine waitForServerToBack
INFO: Master isn't ready to talk to us. Will retry again: response code=503
Jun 22, 2016 2:08:19 PM hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at hudson.remoting.Engine.waitForServerToBack(Engine.java:434)
at hudson.remoting.Engine.run(Engine.java:325)
Jun 22, 2016 2:08:34 PM hudson.remoting.Engine waitForServerToBack
INFO: Failed to connect to the master. Will retry again
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at hudson.remoting.Engine.waitForServerToBack(Engine.java:434)
at hudson.remoting.Engine.run(Engine.java:325)
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my server IP>/
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52672
Jun 22, 2016 2:08:44 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect
It will stay here forever.
Now I'll try and ctrl-c this, and try again:
PS C:\jenkins> java -jar slave.jar -jnlpUrl http://<my IP and node name>/slave-agent.jnlp -secret <my secret>
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave:<my node name>
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among http://<my server IP>/
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <my server IP>:52672
Jun 22, 2016 2:37:10 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP3-connect
Jun 22, 2016 2:37:11 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
... all is good. But until I ctrl-C and reissue, it was stuck. This happens 50% of the time, and on different nodes each time.
When I did the ctrl-C as shown immediately above, the "manage nodes" -> "log" screen adds the following lines, which seem to imply it was connected all along (but it wasn't).
<===[JENKINS REMOTING CAPACITY]===>ERROR: Connection terminated
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at sun.nio.ch.ChannelInputStream.read(Unknown Source)
at java.io.InputStream.read(Unknown Source)
at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:114)
at javax.crypto.CipherInputStream.read(CipherInputStream.java:192)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
JNLP agent connected from /<my node IP>
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.60
This is a Windows agent
Agent successfully connected and online