Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25582

Installing Jenkins service the slave cannot re-connect after first system restart

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: windows-slaves-plugin
    • Labels:
    • Environment:
      Jenkins master (1.580.1) on Ubuntu 14.04 and Jenkins slave on Windows 7 64bit.
    • Similar Issues:

      Description

      I tried to use the Windows Service, which can be installed via a JLNP connection, to let the slave nodes automatically connect to the master. The installation seems to have a problem with shutting down the initial connection to master. So after a system restart of the slave it is no longer able to re-connect to the master because the slave is still marked as connected. It looks like that when the initial JLNP connection was killed the slave has not properly disconnected itself. Further restarts do not show this behavior. As of now you are forced to restart Jenkins on the master machine to let the client connect again.

      Steps:
      1. Setup Jenkins and a slave node via Java Web Start
      2. Connect a Windows slave
      3. Select 'File | Install as a service'
      4. Wait for the service to be installed
      5. Restart the slave

      Once the slave has been restarted, the service will try to re-connect to the master, but the connection is not allowed:

      Nov 12, 2014 11:11:05 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
      INFO: Accepted connection #618 from /192.168.123.3:60703
      Nov 12, 2014 11:11:05 PM jenkins.slaves.JnlpSlaveHandshake error
      WARNING: TCP slave agent connection handler #618 with /192.168.123.3:60703 is aborted: dummy-windows is already connected to this master. Rejecting this connection.
      Nov 12, 2014 11:11:05 PM jenkins.slaves.JnlpSlaveHandshake error
      WARNING: TCP slave agent connection handler #618 with /192.168.123.3:60703 is aborted: Unrecognized name: dummy-windows

      I see hundreds of those messages in a really quick sequence.

        Attachments

          Activity

          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          It happens due to TCP Timeout and PingThread on the master.
          Jenkins master consider a slave as connected if...

          • There's no failed requests from master to slave
          • PingThread has not failed with 4-minutes timeout (it's hardcoded now)

          In order to fix the issue, reconfigure the reconnect interval in Slave Service configuration.
          It will decrease the frequency of connection attempts.

          Show
          oleg_nenashev Oleg Nenashev added a comment - It happens due to TCP Timeout and PingThread on the master. Jenkins master consider a slave as connected if... There's no failed requests from master to slave PingThread has not failed with 4-minutes timeout (it's hardcoded now) In order to fix the issue, reconfigure the reconnect interval in Slave Service configuration. It will decrease the frequency of connection attempts.
          Hide
          danielbeck Daniel Beck added a comment -

          Oleg: Did you investigate this? Shouldn't a proper slave restart disconnect the previous connection?

          I'd try to look in Task Manager whether after service install + start, two slaves are running somehow.

          Show
          danielbeck Daniel Beck added a comment - Oleg: Did you investigate this? Shouldn't a proper slave restart disconnect the previous connection? I'd try to look in Task Manager whether after service install + start, two slaves are running somehow.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Daniel Beck, I see such behavior on my installations (remoting-2.36). We had to increase the TCP Timeout to minutes due to the extremely unreliable VPN connections between sites, so slaves actually have the 4-minutes timeout. In such case the issue easily appears even if the slave behaves correctly.

          BTW, my previous comment applies to the situation, when both WinSW and remoting behave correctly. There could be an issue, so it definitely makes sense to check runaway processes as Daniel proposed.

          Show
          oleg_nenashev Oleg Nenashev added a comment - Daniel Beck , I see such behavior on my installations (remoting-2.36). We had to increase the TCP Timeout to minutes due to the extremely unreliable VPN connections between sites, so slaves actually have the 4-minutes timeout. In such case the issue easily appears even if the slave behaves correctly. BTW, my previous comment applies to the situation, when both WinSW and remoting behave correctly. There could be an issue, so it definitely makes sense to check runaway processes as Daniel proposed.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Took the issue to my backlog

          Show
          oleg_nenashev Oleg Nenashev added a comment - Took the issue to my backlog

            People

            • Assignee:
              Unassigned
              Reporter:
              whimboo Henrik Skupin
            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: