Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59259

sshAgent step does not survive agent disconnect/reconnect

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: ssh-agent-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.176.2
      ssh-agent:1.17
      durable-task: 1.30
      workflow-durable-task-step:2:32
    • Similar Issues:
    • Released As:
      1.19

      Description

      The following pipeline code fails if the connection to the agent is lost and reconnected during execution of the shell script:

      node {
          sshagent(['someCredential']) {
              sh '''
      for i in `seq 1 120`;
      do
          sleep 1
          echo $i
      done
      '''
              echo "Done"
          }
          echo "Done"
      }
      

      The build log is as follows:

      Started by user Jon
      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on build-srv123 in /some/path/workspace/job/path
      [Pipeline] {
      [Pipeline] sshagent
      [ssh-agent] Using credentials someCredential
      [ssh-agent] Looking for ssh-agent implementation...
      [ssh-agent]   Exec ssh-agent (binary ssh-agent on a remote machine)
      $ ssh-agent
      SSH_AUTH_SOCK=/tmp/ssh-gDb94Onc5Loa/agent.30438
      SSH_AGENT_PID=30440
      Running ssh-add (command line suppressed)
      Identity added: /some/path/workspace/job/path-tmp/private_key_970164985924018321.key (/some/path/workspace/job/path-tmp/private_key_970164985924018321.key)
      [ssh-agent] Started.
      [Pipeline] {
      [Pipeline] sh
      + seq 1 120
      + sleep 1
      + echo 1
      1
      + sleep 1
      + echo 2
      2
      + sleep 1
      + echo 3
      3
      + sleep 1
      + echo 4
      4
      + sleep 1
      + echo 5
      5
      + sleep 1
      Cannot contact build-srv123: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on build-srv123 failed. The channel is closing down or has closed down
      + echo 6
      6
      + sleep 1
      + echo 7
      7
      + sleep 1
      + echo 8
      8
      ... truncated for your reading sanity...
      + sleep 1
      + echo 119
      119
      + sleep 1
      + echo 120
      120
      [Pipeline] echo
      Done
      [Pipeline] }
      [Pipeline] // sshagent
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      [BFA] Scanning build for known causes...
      [BFA] No failure causes found
      [BFA] Done. 0s
      java.io.EOFException
      	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
      	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
      	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
      	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
      	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
      	at hudson.remoting.Command.readFrom(Command.java:140)
      	at hudson.remoting.Command.readFrom(Command.java:126)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      Caused: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
      Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on build-srv123 failed. The channel is closing down or has closed down
      	at hudson.remoting.Channel.call(Channel.java:950)
      	at hudson.Launcher$RemoteLauncher.launch(Launcher.java:1060)
      	at hudson.Launcher$ProcStarter.start(Launcher.java:455)
      	at com.cloudbees.jenkins.plugins.sshagent.exec.ExecRemoteAgent.stop(ExecRemoteAgent.java:130)
      	at com.cloudbees.jenkins.plugins.sshagent.SSHAgentStepExecution.cleanUp(SSHAgentStepExecution.java:198)
      	at com.cloudbees.jenkins.plugins.sshagent.SSHAgentStepExecution.access$000(SSHAgentStepExecution.java:24)
      	at com.cloudbees.jenkins.plugins.sshagent.SSHAgentStepExecution$Callback.finished(SSHAgentStepExecution.java:108)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:118)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:377)
      	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:87)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
      	at sun.reflect.GeneratedMethodAccessor1513.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      

      As can be seen in the log, the connection fails, but once the agent comes back online the build log is fetched and continuously pulled until the shell script finishes on the agent. Once the shell script is done SSHAgent throws exception in its teardown phase.

      This behavior is different from when the master restarts during ongoing build, in that situation SSHAgent step works as expected. Once the agent comes back online ssh-agent plugin runs necessary commands to restores operation:

      ...
      + sleep 1
      + echo 17
      17
      + sleep 1
      Resuming build at Fri Sep 06 08:27:37 CEST 2019 after Jenkins restart
      Waiting to resume part of some ยป job #15: Finished waiting
      Ready to run at Fri Sep 06 08:27:40 CEST 2019
      [ssh-agent] Using credentials someCredential
      [ssh-agent] Looking for ssh-agent implementation...
      [ssh-agent]   Exec ssh-agent (binary ssh-agent on a remote machine)
      $ ssh-agent
      SSH_AUTH_SOCK=/tmp/ssh-wNzzYEFltayP/agent.10155
      SSH_AGENT_PID=10157
      Running ssh-add (command line suppressed)
      Identity added: /some/path/workspace/job/path-tmp/private_key_155210530577449396.key (/some/path/workspace/job/path-tmp/private_key_155210530577449396.key)
      [ssh-agent] Started.
      + echo 18
      18
      + sleep 1
      + echo 19
      19
      + sleep 1
      + echo 20
      ....
      

        Attachments

          Activity

          Hide
          jglick Jesse Glick added a comment -

          Jon Sten your proposal sounds reasonable.

          Really I would prefer to delete JNRRemoteAgent and MinaRemoteAgent and just document that users are expected to have an ssh-agent executable in their path.

          Show
          jglick Jesse Glick added a comment - Jon Sten your proposal sounds reasonable. Really I would prefer to delete JNRRemoteAgent and MinaRemoteAgent and just document that users are expected to have an ssh-agent executable in their path.
          Hide
          jons Jon Sten added a comment -

          Sorry about this mess, the proxy objects weren't on my radar when I implemented the fix.

          Anyways, I've made a PR to fix the problem, it gets quite messy, so if any of you se an improvement, please let me know!

          Robin Karlsson, if you have the time to test, I've attached a build of the new version. I would be happy if you could test it

          Really I would prefer to delete JNRRemoteAgent and MinaRemoteAgent and just document that users are expected to have an ssh-agent executable in their path.

          That was my colleges first comment; "why the bleep is it so hard to add ssh credentials to a session". Those classes definitely makes things more complicated...

          Show
          jons Jon Sten added a comment - Sorry about this mess, the proxy objects weren't on my radar when I implemented the fix. Anyways, I've made a PR to fix the problem, it gets quite messy, so if any of you se an improvement, please let me know! Robin Karlsson , if you have the time to test, I've attached a build of the new version. I would be happy if you could test it Really I would prefer to delete JNRRemoteAgent and MinaRemoteAgent and just document that users are expected to have an ssh-agent executable in their path. That was my colleges first comment; "why the bleep is it so hard to add ssh credentials to a session". Those classes definitely makes things more complicated...
          Hide
          jglick Jesse Glick added a comment -

          Thanks for quick turnaround!

          Show
          jglick Jesse Glick added a comment - Thanks for quick turnaround!
          Hide
          snago Robin Karlsson added a comment -

          Jon Sten I've installed the attached build and confirmed that it fixes the problem. Thanks!

          I also found the following in the build console log, so your suspicion about JNR being used was correct:

          [Pipeline] sshagent
          09:26:28  [ssh-agent] Using credentials jenkins (jenkins ssh key)
          09:26:28  [ssh-agent] Looking for ssh-agent implementation...
          09:26:28  [ssh-agent]   Exec ssh-agent (binary ssh-agent on a remote machine)
          09:26:28  $ docker exec fd174bb2336438837e17a078f3ed4072e094f1333a515a28e0ddb6f28f5c6907 ssh-agent
          09:26:28  [ssh-agent]   Java/JNR ssh-agent
          09:26:29  [ssh-agent] Registered BouncyCastle on the remote agent
          09:26:30  [ssh-agent] Started.
          
          Show
          snago Robin Karlsson added a comment - Jon Sten I've installed the attached build and confirmed that it fixes the problem. Thanks! I also found the following in the build console log, so your suspicion about JNR being used was correct: [Pipeline] sshagent 09:26:28 [ssh-agent] Using credentials jenkins (jenkins ssh key) 09:26:28 [ssh-agent] Looking for ssh-agent implementation... 09:26:28 [ssh-agent] Exec ssh-agent (binary ssh-agent on a remote machine) 09:26:28 $ docker exec fd174bb2336438837e17a078f3ed4072e094f1333a515a28e0ddb6f28f5c6907 ssh-agent 09:26:28 [ssh-agent] Java/JNR ssh-agent 09:26:29 [ssh-agent] Registered BouncyCastle on the remote agent 09:26:30 [ssh-agent] Started.
          Hide
          jons Jon Sten added a comment -

          Great, good to hear! Jesse made an official release of the fix yesterday, so you could use that one now, version 1.19.

          Show
          jons Jon Sten added a comment - Great, good to hear! Jesse made an official release of the fix yesterday, so you could use that one now, version 1.19.

            People

            • Assignee:
              jons Jon Sten
              Reporter:
              jons Jon Sten
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: