Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43889

ssh-agent-plugin leaking some ssh-agent processes

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core, ssh-agent-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.32.3, 2.190.2
      ssh-agent-plugin 1.15, 1.17
    • Similar Issues:

      Description

      When a job with the SSHAgentBuildWrapper enabled fails very early (for instance during SCM checkout), an ssh-agent process is left behind. The issue is that the SSHAgentEnvironment is instantiated very early (from preCheckout), but its tearDown method will only be called if execution reaches BuildExecution.doRun (which comes after the SCM checkout phase in AbstractBuildExecution.run).

      Before ssh-agent-plugin 1.14, there was no ssh-agent process, so the issue with some SSHAgentEnvironment not being teared down was less visible (but probably there was already some other kind of less obvious resources leaks with AgentServer not being properly closed).

      This kind of issue with some Environment not being properly teared down can happen as soon as they are not instantiated from BuildWrapper.setUp, but from earlier phases (like BuildWrapper.preCheckout or RunListener.setUpEnvironment). As such, maybe that's something that should be fixed in core (maybe in AbstractBuildExecution.run) rather than specifically in the ssh-agent-plugin, I don't know...

      I've written and attached a "generic workaround" RunListener, which tries to detect this situation from onComplete, and call tearDown for all Environment if it has not been done already. It's not something I propose for inclusion, but rather some code to exhibit the issue. If an ssh-agent specific fix is desirable, then a similar approach might be an option (but targeting SSHAgentEnvironment only).

        Attachments

          Activity

          Hide
          deepraut89 Deepak Raut added a comment -

          Facing this same ssh-agent process leftover in version 1.15 but in different scenario. In multi configuration main hosting job it is starting ssh agent at start but not stopping at end. For each individual configuration it is starting at beginning and stopping at end but same not happening in main job.

          Show
          deepraut89 Deepak Raut added a comment - Facing this same ssh-agent process leftover in version 1.15 but in different scenario. In multi configuration main hosting job it is starting ssh agent at start but not stopping at end. For each individual configuration it is starting at beginning and stopping at end but same not happening in main job.
          Hide
          tom_gl Thomas de Grenier de Latour added a comment - - edited

          I had kind of forgotten about this issue, because we've been using a RunListener to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2).

          Here is a failing test case one can try, to be added in SSHAgentBuildWrapperTest:

              @Issue("JENKINS-43889")
              @Test
              public void sshAgentStoppedOnEarlyBuildFailure() throws Exception {
                  List<String> credentialIds = new ArrayList<String>();
                  credentialIds.add(CREDENTIAL_ID);
          
                  SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees",
                          new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees", "test");
                  SystemCredentialsProvider.getInstance().getCredentials().add(key);
                  SystemCredentialsProvider.getInstance().save();
          
                  FreeStyleProject job = r.createFreeStyleProject("I_will_die_during_SCM_checkout");
                  job.setAssignedNode(r.createSlave());
          
                  SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false);
                  job.getBuildWrappersList().add(sshAgent);
          
                  // make sure this job fails during SCM checkout
                  job.setScm(new FailingSCM());
          
                  Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0);
                  r.assertBuildStatus(Result.FAILURE, build);
                  r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get());
                  r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get());
              }
          
              static class FailingSCM extends SCM {
                  @Override
                  public ChangeLogParser createChangeLogParser() {
                      return null;
                  }
                  // default implementation of checkout(...) method will fail, that's what we want
              }
          
          

          (you will then have some `ssh-agent` processes to kill after running this test)

          I'm still not sure where this should get fixed:

          • either in core, by moving the Environment.tearDown calls up from BuildExecution.doRun to AbstractBuildExecution.run
          • or in the ssh-agent plugin itself, if what it does (ie., adding an Environment to the build from its BuildWrapper.preCheckout implementation, rather than from BuildWrapper.setUp, so that its already set up during SCM checkout) is really bad/unsupported

           

           

          Show
          tom_gl Thomas de Grenier de Latour added a comment - - edited I had kind of forgotten about this issue, because we've been using a  RunListener to work around it, similar to the one I had attached already, but if anyone is interested, the report is still relevant (just checked with ssh-agent plugin code from master, and Jenkins 2.190.2). Here is a failing test case one can try, to be added in SSHAgentBuildWrapperTest : @Issue( "JENKINS-43889" ) @Test public void sshAgentStoppedOnEarlyBuildFailure() throws Exception { List< String > credentialIds = new ArrayList< String >(); credentialIds.add(CREDENTIAL_ID); SSHUserPrivateKey key = new BasicSSHUserPrivateKey(CredentialsScope.GLOBAL, credentialIds.get(0), "cloudbees" , new BasicSSHUserPrivateKey.DirectEntryPrivateKeySource(getPrivateKey()), "cloudbees" , "test" ); SystemCredentialsProvider.getInstance().getCredentials().add(key); SystemCredentialsProvider.getInstance().save(); FreeStyleProject job = r.createFreeStyleProject( "I_will_die_during_SCM_checkout" ); job.setAssignedNode(r.createSlave()); SSHAgentBuildWrapper sshAgent = new SSHAgentBuildWrapper(credentialIds, false ); job.getBuildWrappersList().add(sshAgent); // make sure this job fails during SCM checkout job.setScm( new FailingSCM()); Future<? extends FreeStyleBuild> build = job.scheduleBuild2(0); r.assertBuildStatus(Result.FAILURE, build); r.assertLogContains(Messages.SSHAgentBuildWrapper_Started(), build.get()); r.assertLogContains(Messages.SSHAgentBuildWrapper_Stopped(), build.get()); } static class FailingSCM extends SCM { @Override public ChangeLogParser createChangeLogParser() { return null ; } // default implementation of checkout(...) method will fail, that's what we want } (you will then have some `ssh-agent` processes to kill after running this test) I'm still not sure where this should get fixed: either in core, by moving the  Environment.tearDown calls up from BuildExecution.doRun to AbstractBuildExecution.run or in the ssh-agent plugin itself, if what it does (ie., adding an Environment to the build from its BuildWrapper.preCheckout implementation, rather than from BuildWrapper.setUp , so that its already set up during SCM checkout) is really bad/unsupported    
          Hide
          tom_gl Thomas de Grenier de Latour added a comment -

          To be extra clear in my explanations, here is how the ssh-agent gets launched, starting from AbstractBuild.AbstractBuildExecution#run:

          And here is how it gets stopped (when it does), again starting from AbstractBuild.AbstractBuildExecution#run (a few lines below):

          Show
          tom_gl Thomas de Grenier de Latour added a comment - To be extra clear in my explanations, here is how the ssh-agent gets launched, starting from AbstractBuild.AbstractBuildExecution#run : AbstractBuild.AbstractBuildExecution#run(...) - AbstractBuild.java#L498 SCMCheckoutStrategy#preCheckout(...) - SCMCheckoutStrategy.java#L76 SSHAgentBuildWrapper#preCheckout(...) - SSHAgentBuildWrapper.java#L228 SSHAgentBuildWrapper#createSSHAgentEnvironment(...) - SSHAgentBuildWrapper.java#L248 SSHAgentBuildWrapper.SSHAgentEnvironment#SSHAgentEnvironment(...) - SSHAgentBuildWrapper.java#L363 And here is how it gets stopped (when it does), again starting from AbstractBuild.AbstractBuildExecution#run (a few lines below): AbstractBuild.AbstractBuildExecution#run(...) - AbstractBuild.java#L504 Build#doRun(...) - Build.java#L174 SSHAgentBuildWrapper.SSHAgentEnvironment#tearDown(...) - SSHAgentBuildWrapper.java#L417
          Hide
          tom_gl Thomas de Grenier de Latour added a comment -

          Added "core" to Component/s, because I really don't know who's wrong here (the plugin code or Jenkins code).

          Show
          tom_gl Thomas de Grenier de Latour added a comment - Added " core " to Component/s , because I really don't know who's wrong here (the plugin code or Jenkins code).

            People

            • Assignee:
              Unassigned
              Reporter:
              tom_gl Thomas de Grenier de Latour
            • Votes:
              4 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: