Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53810

Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      Launching node/agent fails with

      ERROR: null
      java.util.concurrent.CancellationException

      We have large number number of jobs in queue which gets assigned to slaves being created by Docker plugin. Even, if we try creating slave and try to launch agent, it fails.
      Note: Slave image adheres to all the requirement and works well if there is no huge queue.

      Executor Status
      SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [09/27/18 02:53:32] [SSH] Opening SSH connection to 9.47.78.144:32870.
      [09/27/18 02:53:32] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
      [09/27/18 02:53:32] [SSH] Authentication successful.
      [09/27/18 02:53:32] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="1" [4]="release" [5]="s390x-ibm-linux-gnu")
      BASH_VERSION='4.2.46(1)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/home/test
      HOSTNAME=01695f4aae73
      HOSTTYPE=s390x
      IFS=$' \t\n'
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=test
      MACHTYPE=s390x-ibm-linux-gnu
      MAIL=/var/mail/test
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=13
      PS4='+ '
      PWD=/home/test
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='9.42.27.56 44378 22'
      SSH_CONNECTION='9.42.27.56 44378 172.17.0.2 22'
      TERM=dumb
      UID=1000
      USER=test
      _=sudo
      [09/27/18 02:53:32] [SSH] Checking java version of /home/test/jdk/bin/java
      Couldn't figure out the Java version of /home/test/jdk/bin/java
      bash: /home/test/jdk/bin/java: No such file or directory
      
      [09/27/18 02:53:33] [SSH] Checking java version of java
      [09/27/18 02:53:34] [SSH] java -version returned 1.8.0_151.
      [09/27/18 02:53:34] [SSH] Starting sftp client.
      [09/27/18 02:53:34] [SSH] Copying latest remoting.jar...
      [09/27/18 02:53:36] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [09/27/18 02:53:36] [SSH] Starting agent process: cd "/home/test" && java  -jar remoting.jar -workDir /home/test
      Sep 27, 2018 6:54:09 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/test/remoting as a remoting work directory
      Both error and output logs will be printed to /home/test/remoting
      ERROR: null
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)
      	at hudson.slaves.DelegatingComputerLauncher.launch(DelegatingComputerLauncher.java:64)
      	at io.jenkins.docker.connector.DockerComputerConnector$1.launch(DockerComputerConnector.java:117)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [09/27/18 02:57:02] Launch failed - cleaning up connection
      Slave JVM has not reported exit code. Is it still running?
      [09/27/18 02:57:02] [SSH] Connection closed.
      
      
      

        Attachments

          Issue Links

            Activity

            Hide
            jthompson Jeff Thompson added a comment -

            I can't see how ClassFilter could cause that, either. It may take a little bit of time to process those patterns though it still shouldn't be very long.

            Show
            jthompson Jeff Thompson added a comment - I can't see how ClassFilter could cause that, either. It may take a little bit of time to process those patterns though it still shouldn't be very long.
            Hide
            jesperjensen Jesper Jensen added a comment -

            Just had the same problem as described, the workaround I found was to change the timeout to 60 seconds (default 10). First time it too very long almost 60 seconds the second time only a few seconds. Hope that this helps!

            Show
            jesperjensen Jesper Jensen added a comment - Just had the same problem as described, the workaround I found was to change the timeout to 60 seconds (default 10). First time it too very long almost 60 seconds the second time only a few seconds. Hope that this helps!
            Hide
            kraxel72 Gerd Hoffmann added a comment -

            Interesting. Raising the timeout helped in my case too.

            Show
            kraxel72 Gerd Hoffmann added a comment - Interesting. Raising the timeout helped in my case too.
            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            Do you mean "Connection Timeout in Seconds"? the default value is 210 seconds, this time is used for all retries so should be enough to cover them all, I mean, if you have 6 retries and a time to wait between them of 10 seconds, this timeout should be 6*10+The timeout of each connection, so if you want to wait for 30 seconds on each connection you would set this setting to 6*10+30*6=240

            On this issue the settings are the default so it is not related to the timeout, see launchTimeoutSeconds value

            SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, 
            

            There is a couple of improvements to change this behavior JENKINS-48617 and JENKINS-48618, for me is really confusing

            In any case, 10 seconds is a shorter value to that timeout, maybe worth to validate that the setting is not lower than 60 and push value under that value to 60 (JENKINS-55858)

            https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/CONFIGURE.md#advanced-settings

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited Do you mean "Connection Timeout in Seconds"? the default value is 210 seconds, this time is used for all retries so should be enough to cover them all, I mean, if you have 6 retries and a time to wait between them of 10 seconds, this timeout should be 6*10+The timeout of each connection, so if you want to wait for 30 seconds on each connection you would set this setting to 6*10+30*6=240 On this issue the settings are the default so it is not related to the timeout, see launchTimeoutSeconds value SSHLauncher{host= '9.47.78.144' , port=32870, credentialsId= 'slave-test' , jvmOptions= '', javaPath=' ', prefixStartSlaveCmd=' ', suffixStartSlaveCmd=' ', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, There is a couple of improvements to change this behavior JENKINS-48617 and JENKINS-48618 , for me is really confusing In any case, 10 seconds is a shorter value to that timeout, maybe worth to validate that the setting is not lower than 60 and push value under that value to 60 ( JENKINS-55858 ) https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/CONFIGURE.md#advanced-settings
            Hide
            kraxel72 Gerd Hoffmann added a comment -

            I've changed two values actually, one from 210 to 600. My UI is in in german, so not fully sure what the english label is, "Connection Timeout in Seconds" sounds right though.

            The other one is two lines below the first, switched from 10 to 60. This is the interval between connection attempts according to the label.

            Show
            kraxel72 Gerd Hoffmann added a comment - I've changed two values actually, one from 210 to 600. My UI is in in german, so not fully sure what the english label is, "Connection Timeout in Seconds" sounds right though. The other one is two lines below the first, switched from 10 to 60. This is the interval between connection attempts according to the label.

              People

              • Assignee:
                ifernandezcalvo Ivan Fernandez Calvo
                Reporter:
                durgadas Durgadas Kamath
              • Votes:
                3 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated: