Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24272

jnlp slaves fail to reconnect when master is restarted

    Details

    • Similar Issues:

      Description

      I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

      I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

      Jenkins 1.575 gives the following slave log output when restarting the master

      Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
      INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: bishop
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
      java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
              at hudson.remoting.Engine.run(Engine.java:213)
      

      Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

      Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

      25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
      INFO: Failed to connect to the master. Will retry again
      java.net.ConnectException: Connection refused
              at java.net.PlainSocketImpl.socketConnect(Native Method)
              at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
              at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
              at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
              at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
              at java.net.Socket.connect(Socket.java:546)
              at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
              at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
              at sun.net.www.http.HttpClient.New(HttpClient.java:321)
              at sun.net.www.http.HttpClient.New(HttpClient.java:338)
              at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
              at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
              at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
              at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
              at hudson.remoting.Engine.run(Engine.java:278)
      ...
      25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to jenkins.example:42715
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      

      The connection/retry logic is contained in remoting Engine.java
      https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

      When connecting to the slave an error causes the connection to terminate (around line 232)

      if(firstError!=null) {
        events.error(firstError);
        return;
      }
      

      prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

      events.onDisconnect();
      // try to connect back to the server every 10 secs.
      waitForServerToBack();
      

      A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Richard Mortimer
            Path:
            core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java
            http://jenkins-ci.org/commit/jenkins/ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f
            Log:
            [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted

            During master restart only attempt to reconnect the slave after the master has
            finished restarting.

            (cherry picked from commit 48e19c58f9e2caa998d0942417d58679f5ce47f0)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java http://jenkins-ci.org/commit/jenkins/ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f Log: [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted During master restart only attempt to reconnect the slave after the master has finished restarting. (cherry picked from commit 48e19c58f9e2caa998d0942417d58679f5ce47f0)
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4292
            [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f)

            Result = UNSTABLE
            ogondza : ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f
            Files :

            • core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f) Result = UNSTABLE ogondza : ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f Files : core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java
            Hide
            romanp Roman Pickl added a comment - - edited

            We still see this on Jenkins ver. 2.19.4 on a mac slave with java version "1.8.0_92"
            Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
            Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

            reconnect when calling the jar via the cli seems to work though.

            Show
            romanp Roman Pickl added a comment - - edited We still see this on Jenkins ver. 2.19.4 on a mac slave with java version "1.8.0_92" Java(TM) SE Runtime Environment (build 1.8.0_92-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode) reconnect when calling the jar via the cli seems to work though.
            Hide
            danielbeck Daniel Beck added a comment -

            Please file a new issue and provide more information about your environment. Two years later, it's likely to be an unrelated issue.

            Show
            danielbeck Daniel Beck added a comment - Please file a new issue and provide more information about your environment. Two years later, it's likely to be an unrelated issue.
            Hide
            maoshen1277 li mengmeng added a comment -

            My Jenkins version is 2.164.3, jdk12. This problem also occurs. I reload JDK to 8 and the problem is solved

            Show
            maoshen1277 li mengmeng added a comment - My Jenkins version is 2.164.3, jdk12. This problem also occurs. I reload JDK to 8 and the problem is solved

              People

              • Assignee:
                oldelvet Richard Mortimer
                Reporter:
                oldelvet Richard Mortimer
              • Votes:
                4 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: