Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-29810

Agent which can survive Jenkins restarts

    Details

    • Similar Issues:

      Description

      As of JENKINS-28689 there is a Workflow step binding the agent. This survives Jenkins restarts in the common case:

      node {
        sshagent('...') {
          sh 'ssh user@host command' // restart Jenkins after connection made
        }
      }
      

      or

      node {
        sshagent('...') {
          sleep 999 // ← restart Jenkins here
          sh 'ssh user@host command'
        }
      }
      

      but in this case

      node {
        sshagent('...') {
          sh '''
      sleep 999 # ← restart Jenkins here
      ssh ...
      '''
        }
      }
      

      the shell script will be launched with one $SSH_AUTH_SOCK; then Jenkins will be restarted, killing the agent server; then after restart a new server will be started with a new socket address, defining a new $SSH_AUTH_SOCK for subsequent forked processes, yet the existing scripts continues to run and when ssh is launched it will fail to connect to the old server and die.

      The solution for this problem would be to reuse a socket address across restarts.

      Another even less common case would be

      node {
        sshagent('...') {
          sh '''
      sleep 999
      # ← restart Jenkins here
      ssh ...
      '''
        }
      }
      

      where the request to use the private key happens to come while Jenkins is restarting. That can only be solved by forking an external process for the agent server so that it survives the loss of the slave agent.

      A related issue is that the current implementation will probably not survive a disconnection and reconnection of the slave agent with the Jenkins master still running, since it relies on onResume and lacks a ComputerListener. The forked agent approach would of course address that as well.

        Attachments

          Issue Links

            Activity

            jglick Jesse Glick created issue -
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Link This issue is blocking JENKINS-28689 [ JENKINS-28689 ]
            recena Manuel Recena Soto made changes -
            Assignee Manuel Jesús Recena Soto [ recena ]
            jglick Jesse Glick made changes -
            Epic Link JENKINS-35399 [ 171192 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 164886 ] JNJira + In-Review [ 181724 ]
            abayer Andrew Bayer made changes -
            Labels workflow pipeline workflow
            abayer Andrew Bayer made changes -
            Labels pipeline workflow pipeline
            Hide
            jglick Jesse Glick added a comment -

            That can only be solved by forking an external process for the agent server

            Done in JENKINS-36997, so probably SSHAgentStepExecution.onResume should be reworked for ExecRemoteAgent: should create a new ExecRemoteAgent instance with a fresh Launcher and FilePath but the same socket and agentEnv.

            Needs some more tests covering these scenarios.

            Show
            jglick Jesse Glick added a comment - That can only be solved by forking an external process for the agent server Done in JENKINS-36997 , so probably SSHAgentStepExecution.onResume should be reworked for ExecRemoteAgent : should create a new ExecRemoteAgent instance with a fresh Launcher and FilePath but the same socket and agentEnv . Needs some more tests covering these scenarios.
            jglick Jesse Glick made changes -
            Link This issue depends on JENKINS-36997 [ JENKINS-36997 ]

              People

              • Assignee:
                recena Manuel Recena Soto
                Reporter:
                jglick Jesse Glick
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: