Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71118

EC2Plugin leaving Zombie processes and failing to terminating instances

XMLWordPrintable

      After upgrading the EC2 plugin from 2.0.2 -> 2.0.4 we experienced an issue with builds that have either failed, terminated or aborted still having zombie processes running in Build Executor Status. These builds seem to be coupled to the node agents as these node agents do not idle timeout and terminate as configured. Instead they persist until the node agents are deleted. Even after the node agent is terminated the process in Build Executor Status persists. The only code change between 2.0.2 and 2.0.4 that AFAIK could possibly be causing this issue is in hudson.plugins.ec2.EC2RetentionStrategy. I have attached an image of the change.

      /**
       * Called when a new {@link EC2Computer} object is introduced (such as when Hudson started, or when
       * a new agent is added.)
       *
       * When Jenkins has just started, we don't want to spin up all the instances, so we only start if
       * the EC2 instance is already running
       */
      @Override
      public void start(EC2Computer c) {
          //Jenkins is in the process of starting up
          if (Jenkins.get().getInitLevel() != InitMilestone.COMPLETED) {
              InstanceState state = null;
              try {
                  state = c.getState();
              } catch (AmazonClientException | InterruptedException | NullPointerException e) {
                  LOGGER.log(Level.FINE, "Error getting EC2 instance state for " + c.getName(), e);
              }
              if (!(InstanceState.PENDING.equals(state) || InstanceState.RUNNING.equals(state))) {
                  LOGGER.info("Ignoring start request for " + c.getName()
                          + " during Jenkins startup due to EC2 instance state of " + state);
                  return;
                  }
          }
      
      
          LOGGER.info("Start requested for " + c.getName());
          c.connect(false);
      } 

      In '2.0.4 NullPointerException e' was removed. I don't know how exactly this is causing this problem or if this is definitely this issue but it is my only suspect currently. 

            thoulen FABRIZIO MANFREDI
            shinobi10 David
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: