Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57795

Orphaned EC2 instances after Jenkins restart

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.176.1
      ec2 plugin 1.43, 1.44, 1.45
    • Similar Issues:

      Description

      Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

      The plugin will just loop on this:

      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
      May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
      May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
      Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
      

      If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

      It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

      We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

      We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

        Attachments

          Activity

          Hide
          jbochenski Jakub Bochenski added a comment - - edited

          BTW I didn't think about it earlier, but shouldn't the plugin actually terminate the EC2 instance on Jenkins shutdown? Otherwise it could stay there indefinitely

          Show
          jbochenski Jakub Bochenski added a comment - - edited BTW I didn't think about it earlier, but shouldn't the plugin actually terminate the EC2 instance on Jenkins shutdown? Otherwise it could stay there indefinitely
          Hide
          jbochenski Jakub Bochenski added a comment - - edited

          Actually I've noticed another problematic thing. After I manually terminate the instance the plugin will spawn 4 new instances that will get terminated immediatelly before finally getting the fifth one up. I thought it's was a fluke at first but it seems to be reproducible consistently.
          Log: https://gist.github.com/jakub-bochenski/c24b1f8e24e7be77aa2522df2c8caaed

          It seems the plugin just terminates the instance for no reason:

          Feb 25, 2020 3:18:30 PM INFO hudson.plugins.ec2.EC2Cloud log
          
          Launching remoting agent (via Trilead SSH2 Connection):  java  -jar /tmp/remoting.jar -workDir /opt/jenkins
          
          Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
          
          Terminated EC2 instance (terminated): i-046afc69c32c1acdd
          
          Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
          
          Removed EC2 instance from jenkins master: i-046afc69c32c1acdd

          Also notice this, despite instance cap=1

          Feb 25, 2020 3:18:21 PM INFO hudson.slaves.NodeProvisioner lambda$update$6
          
          EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)
          Show
          jbochenski Jakub Bochenski added a comment - - edited Actually I've noticed another problematic thing. After I manually terminate the instance the plugin will spawn 4 new instances that will get terminated immediatelly before finally getting the fifth one up. I thought it's was a fluke at first but it seems to be reproducible consistently. Log: https://gist.github.com/jakub-bochenski/c24b1f8e24e7be77aa2522df2c8caaed It seems the plugin just terminates the instance for no reason: Feb 25, 2020 3:18:30 PM INFO hudson.plugins.ec2.EC2Cloud log Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /opt/jenkins Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Terminated EC2 instance (terminated): i-046afc69c32c1acdd Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Removed EC2 instance from jenkins master: i-046afc69c32c1acdd Also notice this, despite instance cap=1 Feb 25, 2020 3:18:21 PM INFO hudson.slaves.NodeProvisioner lambda$update$6 EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)
          Hide
          jbochenski Jakub Bochenski added a comment -

          I tried this a few more times. So far it's reproducible 100% (which is in a way good)

          Show
          jbochenski Jakub Bochenski added a comment - I tried this a few more times. So far it's reproducible 100% (which is in a way good)
          Hide
          raihaan Raihaan Shouhell added a comment -

          Plugin can't terminate all instances on shutdown / restart simply because that can cause shutdown to stall.

          Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.

          Show
          raihaan Raihaan Shouhell added a comment - Plugin can't terminate all instances on shutdown / restart simply because that can cause shutdown to stall. Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.
          Hide
          jbochenski Jakub Bochenski added a comment - - edited

          Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.

          I'm not sure what you mean. I have pasted all of the Jenkins log output here already.
          Do you want me to enable DEBUG level logging for some components?

          Show
          jbochenski Jakub Bochenski added a comment - - edited Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why. I'm not sure what you mean. I have pasted all of the Jenkins log output here already. Do you want me to enable DEBUG level logging for some components?

            People

            • Assignee:
              thoulen FABRIZIO MANFREDI
              Reporter:
              jbochenski Jakub Bochenski
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: