Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59403

agent fails immediately when started in a docker-compose cluster

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Similar Issues:

      Description

      I am trying to work with a master + some agents in a single docker cluster orchestrated by docker-compose with Jenkins ver. 2.176.3 under Windows 10 to experiment locally.  

      Unfortunately the agents spin up much faster than the master, so they try to connect before the master is ready, which for a docker cluster results in a ConnectException (as the docker daemon handles the connect to the socket, but the instance is not ready yet).   From my reading of the source the initial connect is not try-catch proctected for this, so the retry mechanism does not come into play.

       

      This means that in this case that all the agents fail at startup and has to be started manually afterwards.   For my immediate purposes a "–wait" flag waiting X seconds when starting the agent will be fine (or similar), but perhaps the resilience mechanism needs to incoorporate this usecase too?

       

       

      ```

      Sep 17, 2019 9:32:17 AM hudson.remoting.jnlp.Main$CuiListener statuscsr-jenkins-agent-base3_1 | INFO: Locating server among http://docker-images_csr-jenkins_1.local:8080csr-jenkins-agent-base3_1 | Sep 17, 2019 9:32:17 AM hudson.remoting.jnlp.Main$CuiListener errorcsr-jenkins-agent-base3_1 | SEVERE: Failed to connect to http://docker-images_csr-jenkins_1.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
      csr-jenkins-agent-base3_1 | java.io.IOException: Failed to connect to http://docker-images_csr-jenkins_1.local:8080/tcpSlaveAgentListener/: Connection refused (Connection refused)
      csr-jenkins-agent-base3_1 |
      at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:197)
      csr-jenkins-agent-base3_1 |
      at hudson.remoting.Engine.innerRun(Engine.java:523)
      csr-jenkins-agent-base3_1 |
      at hudson.remoting.Engine.run(Engine.java:474)
      csr-jenkins-agent-base3_1 | Caused by: java.net.ConnectException: Connection refused (Connection refused)
      csr-jenkins-agent-base3_1 |
      at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
      csr-jenkins-agent-base3_1 |
      at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
      csr-jenkins-agent-base3_1 |
      at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
      csr-jenkins-agent-base3_1 |
      at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
      csr-jenkins-agent-base3_1 |
      at java.base/java.net.Socket.connect(Socket.java:591)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1242)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1181)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1075)
      csr-jenkins-agent-base3_1 |
      at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1009)
      csr-jenkins-agent-base3_1 |
      at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:194)
      csr-jenkins-agent-base3_1 | ... 2 more

      ```

       

        Attachments

          Activity

          ravn ravn created issue -
          Hide
          jthompson Jeff Thompson added a comment -

          We've had a couple of attempts to augment or change the startup process lately, some which were eventually successful and some which failed. The key to success has been a clearly defined use case and flow. If we can figure that out for this scenario, this could be a nice enhancement.

          I'm always a little concerned about approaches that just wait for some arbitrary amount of time, which may work but not always. Though, if the alternative is to just wait forever, a timeout may not be a bad idea.

          Show
          jthompson Jeff Thompson added a comment - We've had a couple of attempts to augment or change the startup process lately, some which were eventually successful and some which failed. The key to success has been a clearly defined use case and flow. If we can figure that out for this scenario, this could be a nice enhancement. I'm always a little concerned about approaches that just wait for some arbitrary amount of time, which may work but not always. Though, if the alternative is to just wait forever, a timeout may not be a bad idea.
          Hide
          jvz Matt Sicker added a comment -
          Show
          jvz Matt Sicker added a comment - How about using https://github.com/vishnubob/wait-for-it

            People

            • Assignee:
              jthompson Jeff Thompson
              Reporter:
              ravn ravn
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: