Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44219

EC2 plugin keeps terminating and launching build slaves

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None

      Yesterday I discovered my jenkins master node had run out of memory and crashed. The entire EC2 instance at AWS was unresponsive and I couldn't even SSH into it. So I stopped it and started it anew.

      Sure enough jenkins came back seemingly normally until I noticed it was replacing the one worker node every time it was considered idle. I then replaced the master node with a fresh one created with puppet and using data from a backup but no change. I even updated the ec2 plugin to the latest version without any luck.

      Eventually I found the following in the jenkins logs which seems to me almost like it is running 2 instance monitor instances with a shared counter. Both monitors appear to want to create the same EC2 instance and then increment the counter from 0 to 1 to 2.

      Then after the 30 minute idle time has expired it seems to think 2 is too many so it shuts down the worker leaving me with no workers. Then it thinks 0 workers is not good so it launches a new one and increments the counter back to 2 and so on and so forth.

      To be honest, I have no idea how to make this stop as it's pretty annoying.

       

      May 11, 2017 5:46:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Started EC2 alive slaves monitor
      May 11, 2017 5:46:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Finished EC2 alive slaves monitor. 162 ms
      May 11, 2017 5:56:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Started EC2 alive slaves monitor
      May 11, 2017 5:56:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Finished EC2 alive slaves monitor. 138 ms
      May 11, 2017 6:06:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Started EC2 alive slaves monitor
      May 11, 2017 6:06:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Finished EC2 alive slaves monitor. 136 ms
      May 11, 2017 6:10:30 PM hudson.plugins.ec2.EC2RetentionStrategy internalCheck
      INFO: Idle timeout of ubuntu 16.04 eu-west-1a (i-1234) after 30 idle minutes
      May 11, 2017 6:10:30 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout
      INFO: EC2 instance idle time expired: i-1234
      May 11, 2017 6:10:30 PM hudson.plugins.ec2.EC2OndemandSlave terminate
      INFO: Terminated EC2 instance (terminated): i-1234
      May 11, 2017 6:10:30 PM hudson.plugins.ec2.EC2OndemandSlave terminate
      INFO: Removed EC2 instance from jenkins master: i-1234
      May 11, 2017 6:16:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Started EC2 alive slaves monitor
      May 11, 2017 6:16:53 PM hudson.model.AsyncPeriodicWork$1 run
      INFO: Finished EC2 alive slaves monitor. 0 ms
      May 11, 2017 6:17:06 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: Attempting to provision slave from template hudson.plugins.ec2.SlaveTemplate@7229b566 needed by excess workload of 1 units of label 'null'
      May 11, 2017 6:17:06 PM hudson.plugins.ec2.EC2Cloud provision
      WARNING: Label is null - can't calculate how many executors slave will have. Using 3 number of executors
      Considering launching ami-1967056a for template ubuntu 16.04 eu-west-1a
      May 11, 2017 6:17:06 PM hudson.plugins.ec2.SlaveTemplate logProvisionInfo
      INFO: Considering launching ami-1967056a for template ubuntu 16.04 eu-west-1a
      Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate
      Looking for existing instances with describe-instance: {InstanceIds: [],Filters: [{Name: image-id,Values: [ami-1967056a]}, {Name: subnet-id,Values: [subnet-****]}, {Name: key-name,Values: [build-workers]}, {Name: instance-type,Values:
       [t2.medium]}, {Name: tag:Name,Values: [build worker]}],}
      May 11, 2017 6:17:06 PM hudson.plugins.ec2.SlaveTemplate logProvisionInfo
      INFO: Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate
      No existing instance found - created new instance: {InstanceId: i-7890,ImageId: ami-1967056a,State: {Code: 0,Name: pending},PrivateDnsName: ip-10-1-1-197.eu-west-1.compute.internal,PublicDnsName: ,StateTransitionReason: ,KeyN
      ame: build-workers,AmiLaunchIndex: 0,ProductCodes: [],InstanceType: t2.medium,LaunchTime: Thu May 11 18:17:07 UTC 2017,Placement: {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default,},Monitoring: {State: disabled},SubnetId: subnet-
      ****,VpcId: vpc-****,PrivateIpAddress: 10.1.1.197,StateReason: {Code: pending,Message: pending},Architecture: x86_64,RootDeviceType: ebs,RootDeviceName: /dev/sda1,BlockDeviceMappings: [],VirtualizationType: hvm,ClientToken: e91191
      22-d47b-4086-9068-fcb33b1ad4d0,Tags: [{Key: jenkins_slave_type,Value: demand_ubuntu 16.04 eu-west-1a}, {Key: Name,Value: build worker}],SecurityGroups: [{GroupName: default,GroupId: sg-****}],SourceDestCheck: true,Hypervisor: xen,Netw
      orkInterfaces: [{NetworkInterfaceId: eni-****,SubnetId: subnet-****,VpcId: vpc-****,Description: ,OwnerId: ****,Status: in-use,MacAddress: ****,PrivateIpAddress: 10.1.1.197,PrivateDnsName: ip-10-1-1-197.****.compute.internal,SourceDestCheck: true,Groups: [{GroupName: default,GroupId: sg-****}],Attachment: {AttachmentId: eni-attach-****,DeviceIndex: 0,Status: attaching,AttachTime: Thu May 11 18:17:07 UTC 2017,DeleteOnTerminatio
      n: true},PrivateIpAddresses: [{PrivateIpAddress: 10.1.1.197,PrivateDnsName: ip-10-1-1-197.****.compute.internal,Primary: true,}],Ipv6Addresses: []}],IamInstanceProfile: {Arn: arn:aws:iam::****:instance-profile/ci-slave,Id: AI
      ****},EbsOptimized: false,}
      May 11, 2017 6:17:07 PM hudson.plugins.ec2.SlaveTemplate logProvisionInfo
      INFO: No existing instance found - created new instance: {InstanceId: i-7890,ImageId: ami-1967056a,State: {Code: 0,Name: pending},PrivateDnsName: ip-10-1-1-197.****.compute.internal,PublicDnsName: ,StateTransitionReason:
       ,KeyName: build-workers,AmiLaunchIndex: 0,ProductCodes: [],InstanceType: t2.medium,LaunchTime: Thu May 11 18:17:07 UTC 2017,Placement: {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default,},Monitoring: {State: disabled},SubnetId: s
      ubnet-****,VpcId: vpc-****,PrivateIpAddress: 10.1.1.197,StateReason: {Code: pending,Message: pending},Architecture: x86_64,RootDeviceType: ebs,RootDeviceName: /dev/sda1,BlockDeviceMappings: [],VirtualizationType: hvm,ClientToken:
      ****,Tags: [{Key: jenkins_slave_type,Value: demand_ubuntu 16.04 eu-west-1a}, {Key: Name,Value: build worker}],SecurityGroups: [{GroupName: default,GroupId: sg-****}],SourceDestCheck: true,Hypervisor: xe
      n,NetworkInterfaces: [{NetworkInterfaceId: eni-****,SubnetId: subnet-****,VpcId: vpc-****,Description: ,OwnerId: ****,Status: in-use,MacAddress: ****,PrivateIpAddress: 10.1.1.197,PrivateDnsName: ip-10-1-1-
      197.****.compute.internal,SourceDestCheck: true,Groups: [{GroupName: default,GroupId: sg-****}],Attachment: {AttachmentId: eni-attach-****,DeviceIndex: 0,Status: attaching,AttachTime: Thu May 11 18:17:07 UTC 2017,DeleteOnTerm
      ination: true},PrivateIpAddresses: [{PrivateIpAddress: 10.1.1.197,PrivateDnsName: ip-10-1-1-197.****.compute.internal,Primary: true,}],Ipv6Addresses: []}],IamInstanceProfile: {Arn: arn:aws:iam::****:instance-profile/ci-slave,
      Id: ****},EbsOptimized: false,}
      May 11, 2017 6:17:07 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers
      May 11, 2017 6:17:07 PM hudson.plugins.ec2.EC2RetentionStrategy start
      INFO: Start requested for ubuntu 16.04 eu-west-1a (i-7890)
      May 11, 2017 6:17:07 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: Added node named: ubuntu 16.04 eu-west-1a (i-7890), We have now 2 computers
      May 11, 2017 6:17:07 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: Attempting provision - finished, excess workload: -2
      May 11, 2017 6:17:07 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning ubuntu 16.04 eu-west-1a (ami-1967056a) from ec2-thinkerit with 3 executors. Remaining excess workload: -2
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Launching instance: i-7890
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: bootstrap()
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Getting keypair...
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Using key: build-workers
      ****
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Authenticating as ubuntu
      May 11, 2017 6:17:17 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to ec2-****.****.compute.amazonaws.com on port 22, with timeout 10000.
      May 11, 2017 6:17:27 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      May 11, 2017 6:17:27 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Waiting for SSH to come up. Sleeping 5.
      May 11, 2017 6:17:33 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to ec2-****.****.compute.amazonaws.com on port 22, with timeout 10000.
      May 11, 2017 6:17:43 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      May 11, 2017 6:17:43 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Waiting for SSH to come up. Sleeping 5.
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to ec2-****.****.compute.amazonaws.com on port 22, with timeout 10000.
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connected via SSH.
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: connect fresh as root
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to ec2-****.****.compute.amazonaws.com on port 22, with timeout 10000.
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connected via SSH.
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Creating tmp directory (/tmp) if it does not exist
      May 11, 2017 6:17:48 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Executing init script
      May 11, 2017 6:21:27 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Verifying that java exists
      May 11, 2017 6:21:27 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Copying slave.jar
      May 11, 2017 6:21:27 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Launching slave agent (via Trilead SSH2 Connection): java  -jar /tmp/slave.jar
      May 11, 2017 6:21:36 PM hudson.slaves.NodeProvisioner$2 run
      INFO: ubuntu 16.04 eu-west-1a (ami-1967056a) provisioning successfully completed. We have now 2 computer(s)

       

            francisu Francis Upton
            sgo_jenkins Sebastian Gozin
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: