Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47453

Agent can abort reconnection if connection to master is flaky

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • None

      A JNLP agent without the option '-noReconnect' can abort the connection while trying to reconnect to a master which restarted.

      In our architecture, there are several instances of reverse proxies, fronted by a load balancer. Upon restart, a master can roam to a different underlying machine, and the reverse proxies configuration is updated dynamically. However there is a time window (a few seconds) where some reverse proxy can route to the master, and some other can't because it hasn't processed the configuration update yet.

      In the agent logs, the sequence looks like this.

      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: fd843edf
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Sep 26, 2017 9:51:39 AM hudson.remoting.Engine startEngine
      WARNING: No Working Directory. Using the legacy JAR Cache location: /root/.jenkins/cache/jars
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://redactedurl/economic-influence/]
      Sep 26, 2017 9:51:39 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, JNLP-connect, Ping, Diagnostic-Ping, JNLP2-connect, OperationsCenter2]
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful
        Agent address: ec2-34-224-67-174.compute-1.amazonaws.com
        Agent port:    31364
        Identity:      49:9b:2d:d8:21:41:5d:c6:2b:94:4b:be:08:4f:d5:61
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to ec2-34-224-67-174.compute-1.amazonaws.com:31364
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Sep 26, 2017 9:51:39 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: 49:9b:2d:d8:21:41:5d:c6:2b:94:4b:be:08:4f:d5:61
      Sep 26, 2017 9:51:40 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Sep 26, 2017 9:51:49 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Sep 26, 2017 9:51:59 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFO: Master isnt ready to talk to us on {0}. Will retry again: response code={1}
      Sep 26, 2017 9:52:09 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFO: Master isnt ready to talk to us on {0}. Will retry again: response code={1}
      Sep 26, 2017 9:52:24 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFO: Master isnt ready to talk to us on {0}. Will retry again: response code={1}
      Sep 26, 2017 9:52:34 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
      INFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter@b0a6197
      Sep 26, 2017 9:52:36 AM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: fd843edf
      Sep 26, 2017 9:52:36 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Sep 26, 2017 9:52:36 AM hudson.remoting.Engine startEngine
      WARNING: No Working Directory. Using the legacy JAR Cache location: /root/.jenkins/cache/jars
      Sep 26, 2017 9:52:36 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://redactedurl/economic-influence/]
      Sep 26, 2017 9:52:36 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: http://redactedurl/economic-influence/tcpSlaveAgentListener/ is invalid: 502 Bad Gateway
      java.io.IOException: http://redactedurl/economic-influence/tcpSlaveAgentListener/ is invalid: 502 Bad Gateway
      	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:168)
      	at hudson.remoting.Engine.innerRun(Engine.java:495)
      	at hudson.remoting.Engine.run(Engine.java:447)
      

      The current problem is once the health check has passed once, a new connection will be attempted, and if it fails, the agent will abort completely instead of falling back to a retry loop.

            Unassigned Unassigned
            vlatombe Vincent Latombe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: