Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5973

Slaves reconnecting after restarting are rejected because Hudson thinks the slave already connected

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: core
    • Labels:
      None
    • Environment:
      Windows 2003 SE SP2, 2GB VMware VM
    • Similar Issues:

      Description

      [20:02] <roxspring> grrr - just noticed that the hudson has put the slaves offline
      [20:02] <roxspring> am getting this sort of thing an awful lot lately
      [20:07] <roxspring> typical slave logs go like this: (see below)
      [20:08] <roxspring> then the slave services won't start up "Error 1067: The process terminated unexpectedly"
      [20:09] <roxspring> seemingly another rejected connection
      [20:11] <@kohsuke> I know this issue
      [20:11] <@kohsuke> It's because Hudson thinks the slave is still connected even though it's not any more
      [20:11] <@kohsuke> We need to fix this
      [20:11] <@kohsuke> roxspring: if you can file a ticket, that would be great.

      15-Mar-2010 14:35:01 hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel channel
      java.net.SocketException: Connection reset
      	at java.net.SocketInputStream.read(Unknown Source)
      	at java.io.BufferedInputStream.fill(Unknown Source)
      	at java.io.BufferedInputStream.read(Unknown Source)
      	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
      	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
      	at java.io.ObjectInputStream.readObject0(Unknown Source)
      	at java.io.ObjectInputStream.readObject(Unknown Source)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:856)
      15-Mar-2010 14:35:01 hudson.remoting.Request$2 run
      SEVERE: Failed to send back a reply
      java.net.SocketException: Connection reset by peer: socket write error
      	at java.net.SocketOutputStream.socketWrite0(Native Method)
      	at java.net.SocketOutputStream.socketWrite(Unknown Source)
      	at java.net.SocketOutputStream.write(Unknown Source)
      	at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
      	at java.io.BufferedOutputStream.write(Unknown Source)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(Unknown Source)
      	at java.io.ObjectOutputStream.writeFatalException(Unknown Source)
      	at java.io.ObjectOutputStream.writeObject(Unknown Source)
      	at hudson.remoting.Channel.send(Channel.java:417)
      	at hudson.remoting.Request$2.run(Request.java:282)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:58)
      	at java.lang.Thread.run(Unknown Source)
      15-Mar-2010 14:35:01 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      

      then the server starts up again

      15-Mar-2010 14:35:51 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      15-Mar-2010 14:35:51 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      15-Mar-2010 14:35:52 com.youdevise.hudson.slavestatus.SlaveListener call
      INFO: Slave-status listener starting
      15-Mar-2010 14:35:52 com.youdevise.hudson.slavestatus.SlaveListener$1 run
      SEVERE: Could not listen on port
      java.net.BindException: Address already in use: JVM_Bind
      	at java.net.PlainSocketImpl.socketBind(Native Method)
      	at java.net.PlainSocketImpl.bind(Unknown Source)
      	at java.net.ServerSocket.bind(Unknown Source)
      	at java.net.ServerSocket.<init>(Unknown Source)
      	at java.net.ServerSocket.<init>(Unknown Source)
      	at com.youdevise.hudson.slavestatus.SocketHTTPListener.waitForConnection(SlaveListener.java:129)
      	at com.youdevise.hudson.slavestatus.SlaveListener$1.run(SlaveListener.java:63)
      	at com.youdevise.hudson.slavestatus.Daemon.go(Daemon.java:16)
      	at com.youdevise.hudson.slavestatus.SlaveListener.call(SlaveListener.java:83)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:114)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:270)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:58)
      	at java.lang.Thread.run(Unknown Source)
      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Hudson agent is running in headless mode.
      

      Then when connecting...

      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: hb-slave-trunk1 is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: hb-slave-trunk1 is already connected to this master. Rejecting this connection.
      	at hudson.remoting.Engine.run(Engine.java:191)
      
      
      

        Attachments

          Issue Links

            Activity

            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            As a reminder to myself, the problem is that under some circumstances, a TCP connection can be broken in such a way that one peer doesn't notice it right away.

            So the slave should send in some unique identifier so that the master can verify that the same slave is reconnecting (thus the existing connection is no good.)

            This in turn involves in adding some extensibility to the bootstrap protocol of the slave connection.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - As a reminder to myself, the problem is that under some circumstances, a TCP connection can be broken in such a way that one peer doesn't notice it right away. So the slave should send in some unique identifier so that the master can verify that the same slave is reconnecting (thus the existing connection is no good.) This in turn involves in adding some extensibility to the bootstrap protocol of the slave connection.
            Hide
            bsrinath bsrinath added a comment - - edited

            Until this is permanently fixed in Hudson and if people are still struggling with this, here's a workaround that can possibly be used:

            During the Slave startup (we use JNLP), instead of starting it up as below:

             
            javaws http://your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp 
            

            the slave can be started from a startup script as follows (Windows):

             
            java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave
            
            if %errorlevel%==0 ( 
               javaws http://your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp  
            ) 
            
            Show
            bsrinath bsrinath added a comment - - edited Until this is permanently fixed in Hudson and if people are still struggling with this, here's a workaround that can possibly be used: During the Slave startup (we use JNLP), instead of starting it up as below: javaws http: //your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp the slave can be started from a startup script as follows (Windows): java -jar hudson-cli.jar -s http: //your.hudson.com:8080/ delete-node nameofslave if %errorlevel%==0 ( javaws http: //your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp )
            Hide
            track track added a comment -

            java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave

            still results in

            No argument is allowed: nameofslave
            java -jar hudson-cli.jar delete-node args...
            Deletes a node

            I have replaced nameofslave with my salve name. Another issue?

            Show
            track track added a comment - java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave still results in No argument is allowed: nameofslave java -jar hudson-cli.jar delete-node args... Deletes a node I have replaced nameofslave with my salve name. Another issue?
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Merging with JENKINS-5055.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Merging with JENKINS-5055 .
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Merging with JENKINS-5055.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Merging with JENKINS-5055 .

              People

              • Assignee:
                Unassigned
                Reporter:
                roxspring roxspring
              • Votes:
                7 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: