Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50984

SSHLauncher/Fingerprint Thread Locking Stopping Dynamic Slave Launch

    Details

    • Similar Issues:

      Description

      Curious if someone can help me unpack this.  We recently upgraded Jenkins.  We use the Docker-plugin to dynamically provision slaves and we're now running into a situation where the slaves do not properly finish provisioning (The SSH connection is never established). When taking a thread dump there is a very large number of Blocked threads on the SSHLauncher teardown and Fingerprinting for some reason, here's the dumps:

      "Computer.threadPoolForRemoting [#1020]" daemon prio=5 BLOCKED
      	hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1407)
      	hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1403)
      	com.nirima.jenkins.plugins.docker.launcher.DockerComputerLauncher.afterDisconnect(DockerComputerLauncher.java:71)
      	hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:665)
      	jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	java.lang.Thread.run(Thread.java:748)
      
      "Computer.threadPoolForRemoting [#1019]" daemon prio=5 BLOCKED
      	hudson.model.Fingerprint.save(Fingerprint.java:1238)
      	hudson.BulkChange.commit(BulkChange.java:98)
      	com.cloudbees.plugins.credentials.CredentialsProvider.trackAll(CredentialsProvider.java:1533)
      	com.cloudbees.plugins.credentials.CredentialsProvider.track(CredentialsProvider.java:1478)
      	hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:866)
      	com.nirima.jenkins.plugins.docker.launcher.DockerComputerLauncher.launch(DockerComputerLauncher.java:66)
      	hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:288)
      	jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	java.lang.Thread.run(Thread.java:748)
      

      The thread dump has many of these (when things get bad it gets to 100's).  We're currently planning a Docker-plugin upgrade and moving away from the SSH Launcher but I'm looking for ideas as to why this may be happening.

        Attachments

          Issue Links

            Activity

            Hide
            maxfields2000 Maxfield Stewart added a comment -

            I believe this is related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-49235

            The SSH Slaves plugin started to store fingerprints and that's causing a bit of a race condition I think when the containers shutdown/startup, not entirely sure. It might just be that the Docker Plugin I'm using (0.15) doesn't handle this shutdown/spin up well and the newer ones do.

            (The newer ones have very different expectations of SSH slaves which require us to completely refactor our container environments, so we'd actually move off SSH connectors).

            I'm curious if I just revert the SSH Slaves plugin if this problem goes away.

            Show
            maxfields2000 Maxfield Stewart added a comment - I believe this is related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-49235 The SSH Slaves plugin started to store fingerprints and that's causing a bit of a race condition I think when the containers shutdown/startup, not entirely sure. It might just be that the Docker Plugin I'm using (0.15) doesn't handle this shutdown/spin up well and the newer ones do. (The newer ones have very different expectations of SSH slaves which require us to completely refactor our container environments, so we'd actually move off SSH connectors). I'm curious if I just revert the SSH Slaves plugin if this problem goes away.
            Hide
            maxfields2000 Maxfield Stewart added a comment -

            We're testing a revert to SSH Slaves plugin v1.20 today to see if this problem goes away. I'll post back here on how well it works, though the problem is somewhat intermittant and it may take a few days to confirm.

            I'm currently thinking theres a race condition between the slave being shut down and the workspace being removed and the SSH Slaves plugin desire to create a fingerprint of the credential, either the workspace or the remoting channel are closed before the fingerprint.save function can finish and it ends up blocked waiting for a resource that is now gone.

            Show
            maxfields2000 Maxfield Stewart added a comment - We're testing a revert to SSH Slaves plugin v1.20 today to see if this problem goes away. I'll post back here on how well it works, though the problem is somewhat intermittant and it may take a few days to confirm. I'm currently thinking theres a race condition between the slave being shut down and the workspace being removed and the SSH Slaves plugin desire to create a fingerprint of the credential, either the workspace or the remoting channel are closed before the fingerprint.save function can finish and it ends up blocked waiting for a resource that is now gone.
            Hide
            maxfields2000 Maxfield Stewart added a comment -

            After running for a day rolled back to SSH Slaves plugin v1.20 I can confirm that things were /much/ more stable today, we saw ZERO blocked threads today. All nodes provisioned and connected correctly.  So I'm pretty confident the v1.21 plugin and the corresponding credentials fingerprinting changes caused the issue.  

            Show
            maxfields2000 Maxfield Stewart added a comment - After running for a day rolled back to SSH Slaves plugin v1.20 I can confirm that things were /much/ more stable today, we saw ZERO blocked threads today. All nodes provisioned and connected correctly.  So I'm pretty confident the v1.21 plugin and the corresponding credentials fingerprinting changes caused the issue.  
            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment -

            there is a PR to allow disable the credentials tracking https://github.com/jenkinsci/ssh-slaves-plugin/pull/94

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - there is a PR to allow disable the credentials tracking https://github.com/jenkinsci/ssh-slaves-plugin/pull/94

              People

              • Assignee:
                ifernandezcalvo Ivan Fernandez Calvo
                Reporter:
                maxfields2000 Maxfield Stewart
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: