Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-34667

Provision attempt is made when possible slaves count is 0.

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We a have setup with multiple AMI that are tag based. Here is the scenario that is causing a problem.

      Setup:
      AMI_1 has tag ONE and only runs Jenkins jobs with that tag. This AMI has an instance cap of 1 and the executor count on the slave is 1 as well, meaning there can be at most one Jenkins job with tag ONE running at a time.

      AMI_2 has no tags and is used as much as possible. It also has an instance cap of 6 and each slave has 4 executors.

      The overall max cap is 10.

      Jenkins is currently running one instance of AMI_1 (at cap) and two running instances and 2 stopped instances of AMI_2 (below cap). All executors are active. In the queue there is 1 job waiting for AMI_1 and 6 jobs waiting for AMI_2.

      Problem:
      What happens is the plugin tries provision a new slave for AMI_1 but cannot because it is at cap, and then DOES NOT attempt to provision (start) any AMI_2 slaves.

      I looked through the code and saw the following in EC2Cloud.java:

      EC2AbstractSlave.java
      private synchronized EC2AbstractSlave provisionSlaveIfPossible(SlaveTemplate template) {
              /*
               * Note this is synchronized between counting the instances and then allocating the node. Once the node is
               * allocated, we don't look at that instance as available for provisioning.
               */
              int possibleSlavesCount = getPossibleNewSlavesCount(template);
              if (possibleSlavesCount < 0) {
                  LOGGER.log(Level.INFO, "Cannot provision - no capacity for instances: " + possibleSlavesCount);
                  return null;
              }
      
              try {
                  return template.provision(StreamTaskListener.fromStdout(), possibleSlavesCount > 0);
              } catch (IOException e) {
                  LOGGER.log(Level.WARNING, "Exception during provisioning", e);
                  return null;
              }
          }
      

      I think that

      possibleSlavesCount < 0

      should be

      possibleSlavesCount <= 0

      This issue does resolve itself once the queue for AMI_1 is empty, but it can mean an unnecessary long wait for jobs to complete when there should available capacity.

        Attachments

          Activity

          Hide
          rpilachowski Robert Pilachowski added a comment -

          This code change did not solve the problem.

          I tested a slight different scenario than above where both AMIs had provisioned slaves to the cap amount, but all were in the stopped state, i.e. no more slaves could be provisioned. When I started several builds spread across both AMIs, none of the slaves restarted. I suspect this happened because provisionSlaveIfPossible ended without trying to start any stopped slaves. When the old version of the check, (possibleSlavesCount < 0), was done, it would still try to start a stopped slave.

          I believe there is an issue in the logic where provisioning new slaves and starting stopped slaves is treated the same. Instead they should be treated differently and whenever there is a stopped slaves and excess work exists. Or, another option is to have countCurrentEC2Slaves not count stopped slaves.

          Show
          rpilachowski Robert Pilachowski added a comment - This code change did not solve the problem. I tested a slight different scenario than above where both AMIs had provisioned slaves to the cap amount, but all were in the stopped state, i.e. no more slaves could be provisioned. When I started several builds spread across both AMIs, none of the slaves restarted. I suspect this happened because provisionSlaveIfPossible ended without trying to start any stopped slaves. When the old version of the check, (possibleSlavesCount < 0), was done, it would still try to start a stopped slave. I believe there is an issue in the logic where provisioning new slaves and starting stopped slaves is treated the same. Instead they should be treated differently and whenever there is a stopped slaves and excess work exists. Or, another option is to have countCurrentEC2Slaves not count stopped slaves.
          Hide
          rpilachowski Robert Pilachowski added a comment -

          In case I wasn't clear, I suggest removing this change. I ran both test scenarios using the JENKINS-34667 branch, but the code change removed and both scenarios worked correctly.

          Show
          rpilachowski Robert Pilachowski added a comment - In case I wasn't clear, I suggest removing this change. I ran both test scenarios using the JENKINS-34667 branch, but the code change removed and both scenarios worked correctly.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Francis Upton IV
          Path:
          src/main/java/hudson/plugins/ec2/EC2Cloud.java
          http://jenkins-ci.org/commit/ec2-plugin/f45093e4567290112ca4302e03acacacb50bbc5d
          Log:
          JENKINS-34667 Provision attempt is made when possible slaves count is 0. (back out change)

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Francis Upton IV Path: src/main/java/hudson/plugins/ec2/EC2Cloud.java http://jenkins-ci.org/commit/ec2-plugin/f45093e4567290112ca4302e03acacacb50bbc5d Log: JENKINS-34667 Provision attempt is made when possible slaves count is 0. (back out change)
          Hide
          francisu Francis Upton added a comment -

          OK, done on master. I will release a new version.

          Show
          francisu Francis Upton added a comment - OK, done on master. I will release a new version.
          Hide
          francisu Francis Upton added a comment -

          Version 1.33 released with this fix.

          Show
          francisu Francis Upton added a comment - Version 1.33 released with this fix.

            People

            • Assignee:
              francisu Francis Upton
              Reporter:
              rpilachowski Robert Pilachowski
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: