Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57795

Orphaned EC2 instances after Jenkins restart

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.176.1
      ec2 plugin 1.43, 1.44, 1.45
    • Similar Issues:

      Description

      Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

      The plugin will just loop on this:

      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
      May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
      May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
      Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
      

      If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

      It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

      We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

      We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

        Attachments

          Activity

          Hide
          jbochenski Jakub Bochenski added a comment - - edited

          I'm still seeing issues in 1.45. The instance is in running state but plugin can't see t be able to connect it.

          I can't test it on 1.46 because of JENKINS-59564

          Sep 30, 2019 9:32:03 AM INFO hudson.plugins.ec2.EC2Cloud provision
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units
          
          Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Considering launching
          
          Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice
          
          AMI had xvda
          
          Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice
          
          {DeleteOnTermination: true,SnapshotId: snap-0f2d5ab1c6f918116,VolumeSize: 8,VolumeType: gp2,Encrypted: false,}
          
          Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate
          
          Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-02769bd03e603e42f]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-bochja]}, {Name: subnet-id,Values: [subnet-0a167eb56a247e891]}, {Name: instance.group-id,Values: [sg-0a4f5e5ac5bb602e4]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}, {Name: tag:jenkins_server_url,Values: [https://acme.com/]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-02769bd03e603e42f)]}],InstanceIds: [],}
          
          Sep 30, 2019 9:32:05 AM INFO hudson.plugins.ec2.CloudHelper getInstance
          
          Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783', expected 1 result, found []. Instance seems to be dead.
          
          Sep 30, 2019 9:32:05 AM WARNING hudson.plugins.ec2.EC2Cloud provision
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Exception during provisioning
          com.amazonaws.AmazonClientException: Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783', expected 1 result, found []. Instance seems to be dead.
          	at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:54)
          	at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25)
          	at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:566)
          	at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:165)
          	at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:56)
          	at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:1104)
          	at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:773)
          	at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:745)
          	at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:585)
          	at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:540)
          	at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:589)
          	at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:615)
          	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
          	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
          	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
          	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:807)
          	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
          	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          
          Sep 30, 2019 9:32:13 AM INFO hudson.plugins.ec2.EC2Cloud provision
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units
          
          Sep 30, 2019 9:32:14 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0
          
          Sep 30, 2019 9:32:14 AM WARNING hudson.plugins.ec2.EC2Cloud provision
          
          Can't raise nodes for SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}
          
          Sep 30, 2019 9:32:23 AM INFO hudson.plugins.ec2.EC2Cloud provision
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units
          
          Sep 30, 2019 9:32:24 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          
          SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0
          
          Show
          jbochenski Jakub Bochenski added a comment - - edited I'm still seeing issues in 1.45. The instance is in running state but plugin can't see t be able to connect it. I can't test it on 1.46 because of JENKINS-59564 Sep 30, 2019 9:32:03 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Considering launching Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice AMI had xvda Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice {DeleteOnTermination: true ,SnapshotId: snap-0f2d5ab1c6f918116,VolumeSize: 8,VolumeType: gp2,Encrypted: false ,} Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-02769bd03e603e42f]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-bochja]}, {Name: subnet-id,Values: [subnet-0a167eb56a247e891]}, {Name: instance.group-id,Values: [sg-0a4f5e5ac5bb602e4]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}, {Name: tag:jenkins_server_url,Values: [https: //acme.com/]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-02769bd03e603e42f)]}],InstanceIds: [],} Sep 30, 2019 9:32:05 AM INFO hudson.plugins.ec2.CloudHelper getInstance Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783' , expected 1 result, found []. Instance seems to be dead. Sep 30, 2019 9:32:05 AM WARNING hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Exception during provisioning com.amazonaws.AmazonClientException: Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783' , expected 1 result, found []. Instance seems to be dead. at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:54) at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25) at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:566) at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:165) at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:56) at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:1104) at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:773) at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:745) at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:585) at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:540) at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:589) at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:615) at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:807) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Sep 30, 2019 9:32:13 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:14 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Sep 30, 2019 9:32:14 AM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-02769bd03e603e42f ', labels=' docker docker-bakery'} Sep 30, 2019 9:32:23 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:24 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0
          Hide
          sirzic cedric lecoz added a comment -

          same problem (starting a new instance when previous instance has been terminated). seen on PR399.
          C/

          Show
          sirzic cedric lecoz added a comment - same problem (starting a new instance when previous instance has been terminated). seen on PR399. C/
          Hide
          sirzic cedric lecoz added a comment -

          Hi,
          Status update, since last week I have not reproduced the issue when starting a stopped instance.
          I have reproduced a dozen of times the issue when the previous instance was terminated.

          I just saw there was a new PR-399 1.46-rc1052.8c6d855421ac associated to this ticket, so I pushed it to our Jenkins, will keep you updated.
          C/

          Show
          sirzic cedric lecoz added a comment - Hi, Status update, since last week I have not reproduced the issue when starting a stopped instance. I have reproduced a dozen of times the issue when the previous instance was terminated. I just saw there was a new PR-399 1.46-rc1052.8c6d855421ac associated to this ticket, so I pushed it to our Jenkins, will keep you updated. C/
          Hide
          sirzic cedric lecoz added a comment -

          Hi Raihaan Shouhell,
          Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt. (what I believe was fixed in PR-397).

          Issue from a stopped slave has not yet been reproduced.

          BR,
          Cedric.

          Show
          sirzic cedric lecoz added a comment - Hi Raihaan Shouhell , Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt . (what I believe was fixed in PR-397). Issue from a stopped slave has not yet been reproduced. BR, Cedric.
          Hide
          raihaan Raihaan Shouhell added a comment -

          cedric lecoz yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.

          Show
          raihaan Raihaan Shouhell added a comment - cedric lecoz yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.

            People

            • Assignee:
              thoulen FABRIZIO MANFREDI
              Reporter:
              jbochenski Jakub Bochenski
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: