Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57795

Orphaned EC2 instances after Jenkins restart

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.176.1
      ec2 plugin 1.43, 1.44, 1.45
    • Similar Issues:

      Description

      Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

      The plugin will just loop on this:

      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
      May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
      May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
      Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
      

      If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

      It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

      We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

      We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

        Attachments

          Activity

          Hide
          raihaan Raihaan Shouhell added a comment - - edited

          [^ec2.hpi] cedric lecoz

          For your latest issue the linked HPI should solve it. The issue you seem to see is when starting from a stopped instance due to eventual consistency of AWS APIs it occasionally sees a freshly started instance as stopped as a result for newly started instances I added a retry to deal with this.

          Show
          raihaan Raihaan Shouhell added a comment - - edited [^ec2.hpi] cedric lecoz For your latest issue the linked HPI should solve it. The issue you seem to see is when starting from a stopped instance due to eventual consistency of AWS APIs it occasionally sees a freshly started instance as stopped as a result for newly started instances I added a retry to deal with this.
          Hide
          sirzic cedric lecoz added a comment -

          ok tks, will try asap but that may not be before the WE, jenkins is slightly too busy during the week

          Show
          sirzic cedric lecoz added a comment - ok tks, will try asap but that may not be before the WE, jenkins is slightly too busy during the week
          Hide
          sirzic cedric lecoz added a comment - - edited

          hi Raihaan Shouhell,
          Is the ec2.hpi plugin you attached here the same which was built by https://github.com/jenkinsci/ec2-plugin/pull/398 ?
          It's easier to add to my ci env (automated) when the plugin comes directly from ci.jenkins.io, and easier to track too

          I am asking because it does not looks like PR-398 includes what I tested from PR-397.

          tks,
          C/

          Show
          sirzic cedric lecoz added a comment - - edited hi Raihaan Shouhell , Is the ec2.hpi plugin you attached here the same which was built by https://github.com/jenkinsci/ec2-plugin/pull/398 ? It's easier to add to my ci env (automated) when the plugin comes directly from ci.jenkins.io, and easier to track too I am asking because it does not looks like PR-398 includes what I tested from PR-397. tks, C/
          Hide
          raihaan Raihaan Shouhell added a comment -

          cedric lecoz yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.

          Show
          raihaan Raihaan Shouhell added a comment - cedric lecoz yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.
          Hide
          sirzic cedric lecoz added a comment -

          Hi Raihaan Shouhell,
          Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt. (what I believe was fixed in PR-397).

          Issue from a stopped slave has not yet been reproduced.

          BR,
          Cedric.

          Show
          sirzic cedric lecoz added a comment - Hi Raihaan Shouhell , Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt . (what I believe was fixed in PR-397). Issue from a stopped slave has not yet been reproduced. BR, Cedric.

            People

            • Assignee:
              thoulen FABRIZIO MANFREDI
              Reporter:
              jbochenski Jakub Bochenski
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: