Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53810

Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      Launching node/agent fails with

      ERROR: null
      java.util.concurrent.CancellationException

      We have large number number of jobs in queue which gets assigned to slaves being created by Docker plugin. Even, if we try creating slave and try to launch agent, it fails.
      Note: Slave image adheres to all the requirement and works well if there is no huge queue.

      Executor Status
      SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [09/27/18 02:53:32] [SSH] Opening SSH connection to 9.47.78.144:32870.
      [09/27/18 02:53:32] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
      [09/27/18 02:53:32] [SSH] Authentication successful.
      [09/27/18 02:53:32] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="1" [4]="release" [5]="s390x-ibm-linux-gnu")
      BASH_VERSION='4.2.46(1)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/home/test
      HOSTNAME=01695f4aae73
      HOSTTYPE=s390x
      IFS=$' \t\n'
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=test
      MACHTYPE=s390x-ibm-linux-gnu
      MAIL=/var/mail/test
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=13
      PS4='+ '
      PWD=/home/test
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='9.42.27.56 44378 22'
      SSH_CONNECTION='9.42.27.56 44378 172.17.0.2 22'
      TERM=dumb
      UID=1000
      USER=test
      _=sudo
      [09/27/18 02:53:32] [SSH] Checking java version of /home/test/jdk/bin/java
      Couldn't figure out the Java version of /home/test/jdk/bin/java
      bash: /home/test/jdk/bin/java: No such file or directory
      
      [09/27/18 02:53:33] [SSH] Checking java version of java
      [09/27/18 02:53:34] [SSH] java -version returned 1.8.0_151.
      [09/27/18 02:53:34] [SSH] Starting sftp client.
      [09/27/18 02:53:34] [SSH] Copying latest remoting.jar...
      [09/27/18 02:53:36] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [09/27/18 02:53:36] [SSH] Starting agent process: cd "/home/test" && java  -jar remoting.jar -workDir /home/test
      Sep 27, 2018 6:54:09 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/test/remoting as a remoting work directory
      Both error and output logs will be printed to /home/test/remoting
      ERROR: null
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)
      	at hudson.slaves.DelegatingComputerLauncher.launch(DelegatingComputerLauncher.java:64)
      	at io.jenkins.docker.connector.DockerComputerConnector$1.launch(DockerComputerConnector.java:117)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [09/27/18 02:57:02] Launch failed - cleaning up connection
      Slave JVM has not reported exit code. Is it still running?
      [09/27/18 02:57:02] [SSH] Connection closed.
      
      
      

        Attachments

          Activity

          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Also CC Ivan Fernandez Calvo. It looks rather like an SSH Slaves plugin issue

          Show
          oleg_nenashev Oleg Nenashev added a comment - Also CC Ivan Fernandez Calvo . It looks rather like an SSH Slaves plugin issue
          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - 90% percent sure that it is related to https://issues.jenkins-ci.org/browse/JENKINS-49235 , there is a workaround https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall
          Hide
          durgadas Durgadas Kamath added a comment -

          Ivan Fernandez Calvo I tried the above workaround but that didn't solve the problem.

          Show
          durgadas Durgadas Kamath added a comment - Ivan Fernandez Calvo I tried the above workaround but that didn't solve the problem.
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          So you set the property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` on you Jenkins instance JVM parameters and the issue persist, In that case I need this info to try to understand/replicate whatever happen. https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug, what I saw in the log it is that the agent try to connect and after 4 min it is killed (pingThread probably) but it seems never end the connection. You said that this happens when you have when you have a huge queue, probably we'll need a thread dump of the instance when the issue happens to see what threads are blocked.

          https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - So you set the property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` on you Jenkins instance JVM parameters and the issue persist, In that case I need this info to try to understand/replicate whatever happen. https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug , what I saw in the log it is that the agent try to connect and after 4 min it is killed (pingThread probably) but it seems never end the connection. You said that this happens when you have when you have a huge queue, probably we'll need a thread dump of the instance when the issue happens to see what threads are blocked. https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump
          Hide
          xman_pires yong wu added a comment -

           

          I also ran into similar problem while adding an node – SLES12.3 . Java ver on slave was up to 1.8 , not sure if this is related to ssh slave plugin or not...

          Show
          xman_pires yong wu added a comment -   I also ran into similar problem while adding an node – SLES12.3 . Java ver on slave was up to 1.8 , not sure if this is related to ssh slave plugin or not...
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          yong wu Could you grab a thread dump meanwhile the agent is stuck trying to start? I see in the log that you have about a minute to get it

          https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - yong wu Could you grab a thread dump meanwhile the agent is stuck trying to start? I see in the log that you have about a minute to get it https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

            People

            • Assignee:
              ifernandezcalvo Ivan Fernandez Calvo
              Reporter:
              durgadas Durgadas Kamath
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: