Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54679

SSHLauncher doesn't continue retrying to connect to remote executor

    Details

    • Similar Issues:

      Description

      SSHLauncher{host='10.50.10.252', port=22, credentialsId='aaf2ee5e-32bd-4675-9793-0570922f9c66', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=5, maxNumRetries=120, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/16/18 20:19:40] [SSH] Opening SSH connection to 10.50.10.252:22.
      Connection refused (Connection refused)
      SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 120 more retries left.
      Connection refused (Connection refused)
      SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 119 more retries left.
      Connection refused (Connection refused)
      SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 118 more retries left.
      ERROR: null
      java.util.concurrent.CancellationException
      {{ at java.util.concurrent.FutureTask.report(FutureTask.java:121)}}
      {{ at java.util.concurrent.FutureTask.get(FutureTask.java:192)}}
      {{ at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)}}
      {{ at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)}}
      {{ at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)}}
      {{ at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)}}
      {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
      {{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
      {{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
      {{ at java.lang.Thread.run(Thread.java:748)}}
      [11/16/18 20:19:45] Launch failed - cleaning up connection
      [11/16/18 20:19:45] [SSH] Connection closed.

       

      This happens whenever a new ec2 fleet instance is brought online. During this time cloud-init is still working it's magic to install docker/openjdk and add the new Jenkins user (and it's key). However after the Launch failed error message there are no more retries and that slave is never contacted again, even-though if we manually press the button to reconnect the slave comes online without issues.

       

      Clearly there are more retries left, yet it is completely dead in the water.

      This used to work without issues on older versions of Jenkins and this just recently started.

       

      We are running Jenkins ver. 2.138.3 from the jenkinsci/blueocean docker image.

        Attachments

          Activity

          Hide
          borgstrom Evan Borgstrom added a comment -

          We are also facing this same issue.

          SSHLauncher{host='10.200.130.209', port=22, credentialsId='slave-ssh', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=3, retryWaitTime=60, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
          [12/13/18 19:14:12] [SSH] Opening SSH connection to 10.200.130.209:22.
          ERROR: null
          java.util.concurrent.CancellationException
          	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
          	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
          	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902)
          	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
          	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          [12/13/18 19:15:12] Launch failed - cleaning up connection
          [12/13/18 19:15:12] [SSH] Connection closed.
          

          We recently upgraded from jenkins 2.138.2 to 2.150.1, and from ssh-slaves 1.28.1 to 1.29.1. I'm going to rollback to 1.28.1 again and see if it solves our issue.

          Like Bert JW Regeer this used to work without issue and if we go and click the "Launch Agent" button manually the host connects without issue.

          Show
          borgstrom Evan Borgstrom added a comment - We are also facing this same issue. SSHLauncher{host='10.200.130.209', port=22, credentialsId='slave-ssh', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=3, retryWaitTime=60, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true} [12/13/18 19:14:12] [SSH] Opening SSH connection to 10.200.130.209:22. ERROR: null java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [12/13/18 19:15:12] Launch failed - cleaning up connection [12/13/18 19:15:12] [SSH] Connection closed. We recently upgraded from jenkins 2.138.2 to 2.150.1, and from ssh-slaves 1.28.1 to 1.29.1. I'm going to rollback to 1.28.1 again and see if it solves our issue. Like Bert JW Regeer this used to work without issue and if we go and click the "Launch Agent" button manually the host connects without issue.
          Hide
          borgstrom Evan Borgstrom added a comment -

          FWIW, we found out that this is a race that we're losing.

          We are still on 1.29.1 of ssh-slaves, but we changed our retryWaitTime from 60 to 120 seconds and we don't run into this issue anymore.

          Show
          borgstrom Evan Borgstrom added a comment - FWIW, we found out that this is a race that we're losing. We are still on 1.29.1 of ssh-slaves, but we changed our retryWaitTime from 60 to 120 seconds and we don't run into this issue anymore.
          Hide
          terma Artem Stasiuk added a comment -

          Thx for info, Bert JW Regeer did you fix it in the same way?

          Show
          terma Artem Stasiuk added a comment - Thx for info, Bert JW Regeer did you fix it in the same way?
          Hide
          shrinathdm shrinath mangalore added a comment -

          We recently updated Jenkins to 2.194 and ssh slave plugin to 1.30.1.

          Post upgrade Jenkins slave agents are failed and unable to connect.

          Please find the logs below and let me know the workaround or fix for this issue.

          just before slave Senseai gets launched ...just before slave Senseai gets launched ...executing pre-launch scripts ...Connection timed out (Connection timed out)SSH Connection failed with IOException: "Connection timed out (Connection timed out)", retrying in 15 seconds.  There are 10 more retries left.ERROR: nulljava.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:475) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)[09/05/19 04:40:42] Launch failed - cleaning up connection
          Show
          shrinathdm shrinath mangalore added a comment - We recently updated Jenkins to 2.194 and ssh slave plugin to 1.30.1. Post upgrade Jenkins slave agents are failed and unable to connect. Please find the logs below and let me know the workaround or fix for this issue. just before slave Senseai gets launched ...just before slave Senseai gets launched ...executing pre-launch scripts ...Connection timed out (Connection timed out)SSH Connection failed with IOException: "Connection timed out (Connection timed out)", retrying in 15 seconds.  There are 10 more retries left.ERROR: nulljava.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:475) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [09/05/19 04:40:42] Launch failed - cleaning up connection

            People

            • Assignee:
              schmutze Chad Schmutzer
              Reporter:
              xistence Bert JW Regeer
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: