Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54897

Unexpected error in launching a agent after restart

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: ssh-slaves-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.138.2
      ssh-slaves-plugin ver: 1.29.1
      Linux RHEL 7.5
    • Similar Issues:
    • Released As:
      ssh-slaves-1.30.2

      Description

      Launch agent has issues after restarting Jenkins. If you try to launch several times, it will eventually work.

       

      ssh rules: no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty

       

      Node config.xml

      <?xml version='1.1' encoding='UTF-8'?>
      <slave>
        <name>cac-slave-01</name>
        <description></description>
        <remoteFS>/app/var/lib/jenkins/slave_node</remoteFS>
        <numExecutors>5</numExecutors>
        <mode>NORMAL</mode>
        <retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
        <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.29.1">
          <host>jenkins-slave.example.com</host>
          <port>22</port>
          <credentialsId>jenkins-master-ssh-key</credentialsId>
          <launchTimeoutSeconds>210</launchTimeoutSeconds>
          <maxNumRetries>10</maxNumRetries>
          <retryWaitTime>15</retryWaitTime>
          <sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy"/>
        </launcher>
        <label></label>
        <nodeProperties/>
      </slave>

       

      Error log 1:

      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:46:51] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      ERROR: null
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:50:21] Launch failed - cleaning up connection
      [11/27/18 16:50:21] SSH Launch of cac-slave-01 on jenkins-slave.example.com failed in 210,000 ms

      Error log 2:

      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:18:29] [SSH] Authentication successful.
      [11/27/18 16:18:29] [SSH] Authentication successful.
      [11/27/18 16:18:29] [SSH] The remote user's environment is:
      [11/27/18 16:18:29] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15365
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 36584 22'
      SSH_CONNECTION='165.115.33.181 36584 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3293
      _=/etc/bashrc
      [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15366
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 36878 22'
      SSH_CONNECTION='165.115.33.181 36878 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3292
      _=/etc/bashrc
      [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java
      [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:18:29] [SSH] Starting sftp client.
      [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:18:29] [SSH] Starting sftp client.
      [11/27/18 16:18:29] [SSH] Copying latest remoting.jar...
      [11/27/18 16:18:29] [SSH] Copying latest remoting.jar...
      [11/27/18 16:18:29] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      [11/27/18 16:18:29] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      <===[JENKINS REMOTING CAPACITY]===>channel started
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Remoting version: 3.25
      Remoting version: 3.25
      This is a Unix agent
      This is a Unix agent
      Evacuated stdout
      Evacuated stdout
      ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
      java.lang.IllegalStateException: Already connected
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:678)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
      	at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032)
      	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:18:33] Launch failed - cleaning up connection
      [11/27/18 16:18:33] [SSH] Connection closed.
      Connection terminated
      channel stopped
      Connection terminated
      ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
      java.lang.NullPointerException
      	at org.jenkinsci.modules.systemd_slave_installer.SlaveInstallerFactoryImpl.createIfApplicable(SlaveInstallerFactoryImpl.java:33)
      	at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createIfApplicable(SlaveInstallerFactory.java:29)
      	at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createFor(SlaveInstallerFactory.java:46)
      	at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:30)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:693)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
      	at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032)
      	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:18:33] Launch failed - cleaning up connection
      [11/27/18 16:18:33] [SSH] Connection closed.
      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:19:03] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      [11/27/18 16:19:03] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:19:03] [SSH] Authentication successful.
      [11/27/18 16:19:03] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15645
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 37012 22'
      SSH_CONNECTION='165.115.33.181 37012 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3294
      _=/etc/bashrc
      [11/27/18 16:19:03] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:19:03] [SSH] Checking java version of java
      [11/27/18 16:19:03] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:19:03] [SSH] Starting sftp client.
      [11/27/18 16:19:03] [SSH] Copying latest remoting.jar...
      [11/27/18 16:19:03] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:19:03] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      Nov 27, 2018 4:19:03 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Remoting version: 3.25
      This is a Unix agent
      Evacuated stdout
      Agent successfully connected and online

        Attachments

          Issue Links

            Activity

            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment -

            Did you see if the plugin retry to connect automatically? if should be because has the default configuration to make it `launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15`, after 15 seconds of the fail it must retry to connect

            The duplicate messages bothered me, it is like two processes were running at the same time, Which JDK do you use?

            [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
            [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
            [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
            [11/27/18 16:18:29] [SSH] Authentication successful.
            [11/27/18 16:18:29] [SSH] Authentication successful.
            [11/27/18 16:18:29] [SSH] The remote user's environment is:
            [11/27/18 16:18:29] [SSH] The remote user's environment is:
            
            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - Did you see if the plugin retry to connect automatically? if should be because has the default configuration to make it `launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15`, after 15 seconds of the fail it must retry to connect The duplicate messages bothered me, it is like two processes were running at the same time, Which JDK do you use? [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] The remote user's environment is: [11/27/18 16:18:29] [SSH] The remote user's environment is:
            Hide
            kpsimoulis Konstantinos Psimoulis added a comment -
            openjdk version "1.8.0_181"
            OpenJDK Runtime Environment (build 1.8.0_181-b13)
            OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
            

            I think this is because I clicked Launch Agent and at the same time it was configured to re-try. I guess there are no semaphores to handle that

            Show
            kpsimoulis Konstantinos Psimoulis added a comment - openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) I think this is because I clicked Launch Agent and at the same time it was configured to re-try. I guess there are no semaphores to handle that
            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            yep, I'm starting to understand what happens in every `CancellationException`, I am working on a consistent way to replicate it on a test to make a proper fix.

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited yep, I'm starting to understand what happens in every `CancellationException`, I am working on a consistent way to replicate it on a test to make a proper fix.
            Hide
            aenlic Bob Ashforth added a comment -

            Can you recommend any workarounds? This is crippling a major lift-and-shift project for us.

            Show
            aenlic Bob Ashforth added a comment - Can you recommend any workarounds? This is crippling a major lift-and-shift project for us.
            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment -

            On the logs attached the retries are not working so the SSHLauncher should be blocked by other thread, it is possibly related to the credentials tracking issue JENKINS-49235 if you get a thread dump when the issue happens you can compare it to this troubleshooting threads-stuck-at-credentialsprovidertrackall

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - On the logs attached the retries are not working so the SSHLauncher should be blocked by other thread, it is possibly related to the credentials tracking issue JENKINS-49235 if you get a thread dump when the issue happens you can compare it to this troubleshooting threads-stuck-at-credentialsprovidertrackall

              People

              • Assignee:
                ifernandezcalvo Ivan Fernandez Calvo
                Reporter:
                kpsimoulis Konstantinos Psimoulis
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: