Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26416

Jenkins Does Not Confirm A Node's Executor Matches Its Label Requirements Before Running A Job On It

    Details

    • Similar Issues:

      Description

      I created a job which contains a groovy script. The groovy script's job is to put a large number of builds into the Jenkins queue which are set to wait until machines with a specific label are found to run them. Only one of these jobs is set to run at a time on each machine.

      By default, machines are set with the label "uninstalled". Once this job runs on them, that label is changed from "uninstalled" to "installing" and when finished "installed". The job is only allowed to run on nodes with the label "uninstalled" on them.

      Right now, if a single machine has the "uninstalled" label attached to it, all of the jobs in the queue will set themselves to run on that node. Once a single job in the queue starts on that machine, that "uninstalled" label is set to "installing", which SHOULD take it out of the running for the other jobs to run on it. When that job completes, however, the jobs in the queue will still run on that node, even though it no longer has an "uninstalled" label.

      Jenkins' queue should be changed so that before a job is transferred from the "Buildable" queue into the executor, it should check first to make sure that it still complies with the labels on the node it plans to build on. At random times Jenkins will catch on before the build actually goes off, but more often than not, all of the jobs in the queue will run on this one node instead of waiting for other ones to come in.

      I have been looking at the Jenkins code for the queue and the maintain() method seems to be what needs changing here. It appears as though the JobOffer occurs before a last minute check for blockers that occurs on line 62 of the following code. Can anyone confirm if what I am looking at is right?

      Queue.java
      public synchronized void maintain() {
              LOGGER.log(Level.FINE, "Queue maintenance started {0}", this);
      
              // The executors that are currently waiting for a job to run.
              Map<Executor,JobOffer> parked = new HashMap<Executor,JobOffer>();
      
              {// update parked
                  for (Computer c : Jenkins.getInstance().getComputers()) {
                      for (Executor e : c.getExecutors()) {
                          if (e.isParking()) {
                              parked.put(e,new JobOffer(e));
                          }
                      }
                  }
              }
      
      
              {// blocked -> buildable
                  for (BlockedItem p : new ArrayList<BlockedItem>(blockedProjects.values())) {// copy as we'll mutate the list
                      if (!isBuildBlocked(p) && allowNewBuildableTask(p.task)) {
                          // ready to be executed
                          Runnable r = makeBuildable(new BuildableItem(p));
                          if (r != null) {
                              p.leave(this);
                              r.run();
                          }
                      }
                  }
              }
      
              // waitingList -> buildable/blocked
              while (!waitingList.isEmpty()) {
                  WaitingItem top = peek();
      
                  if (top.timestamp.compareTo(new GregorianCalendar())>0)
                      break; // finished moving all ready items from queue
      
                  top.leave(this);
                  Task p = top.task;
                  if (!isBuildBlocked(top) && allowNewBuildableTask(p)) {
                      // ready to be executed immediately
                      Runnable r = makeBuildable(new BuildableItem(top));
                      if (r != null) {
                          r.run();
                      } else {
                          new BlockedItem(top).enter(this);
                      }
                  } else {
                      // this can't be built now because another build is in progress
                      // set this project aside.
                      new BlockedItem(top).enter(this);
                  }
              }
      
              final QueueSorter s = sorter;
              if (s != null)
              	s.sortBuildableItems(buildables);
      
              // allocate buildable jobs to executors
              for (BuildableItem p : new ArrayList<BuildableItem>(buildables)) {// copy as we'll mutate the list in the loop
                  // one last check to make sure this build is not blocked.
                  if (isBuildBlocked(p)) {
                      p.leave(this);
                      new BlockedItem(p).enter(this);
                      LOGGER.log(Level.FINE, "Catching that {0} is blocked in the last minute", p);
                      continue;
                  }
      
                  List<JobOffer> candidates = new ArrayList<JobOffer>(parked.size());
                  for (JobOffer j : parked.values())
                      if (j.canTake(p))
                          candidates.add(j);
      
                  MappingWorksheet ws = new MappingWorksheet(p, candidates);
                  Mapping m = loadBalancer.map(p.task, ws);
                  if (m == null) {
                      // if we couldn't find the executor that fits,
                      // just leave it in the buildables list and
                      // check if we can execute other projects
                      LOGGER.log(Level.FINER, "Failed to map {0} to executors. candidates={1} parked={2}", new Object[]{p, candidates, parked.values()});
                      continue;
                  }
      
                  // found a matching executor. use it.
                  WorkUnitContext wuc = new WorkUnitContext(p);
                  m.execute(wuc);
      
                  p.leave(this);
                  if (!wuc.getWorkUnits().isEmpty())
                      makePending(p);
                  else
                      LOGGER.log(Level.FINE, "BuildableItem {0} with empty work units!?", p);
              }
          }
      

      If anyone knows any ways around this in the meantime through the use of Groovy or whatever, your advice would be greatly appreciated.

      Thank You!
      -Kevin

        Attachments

          Activity

          Hide
          danielbeck Daniel Beck added a comment -

          This has nothing to do with Groovy Plugin. It's not an issue report either. To get support in how to use Jenkins, email the mailing lists or join #jenkins on Freenode.

          That said, Jenkins caches label assignments. It may work if you force the cache to update. Check the sources.

          Show
          danielbeck Daniel Beck added a comment - This has nothing to do with Groovy Plugin. It's not an issue report either. To get support in how to use Jenkins, email the mailing lists or join #jenkins on Freenode. That said, Jenkins caches label assignments. It may work if you force the cache to update. Check the sources.
          Hide
          krandino Kevin Randino added a comment -

          Hey Daniel,

          I'm realizing now that I should have set it as a "core" issue instead of a groovy-plugin one, my mistake on that. I still believe that this is a Jenkins issue however as Jenkins core is not acting as expected. I would never expect Jenkins to run a job on a node that is not suited for that job through the assigned labels. Cached or not, Jenkins should still run a sanity check on its jobs before running them.

          Show
          krandino Kevin Randino added a comment - Hey Daniel, I'm realizing now that I should have set it as a "core" issue instead of a groovy-plugin one, my mistake on that. I still believe that this is a Jenkins issue however as Jenkins core is not acting as expected. I would never expect Jenkins to run a job on a node that is not suited for that job through the assigned labels. Cached or not, Jenkins should still run a sanity check on its jobs before running them.
          Hide
          danielbeck Daniel Beck added a comment -

          Jenkins core is not acting as expected

          jenkins.model.Jenkins.trimLabels() documents this specific issue (Label.reset() does not, presumably because it's non-public). The problem is that, even if it's so accessible through the use of plugins, you are programming Jenkins using its internal API. So you cannot disregard internal structures or their constraints.

          Additionally, you're way beyond what would usually be possible to do with labels – so it's not unreasonable that your use case is not covered by Jenkins internals without some additional work and understanding on your part.

          If Jenkins wouldn't update effective labels after submitting the slave node config form, it'd be different.

          Show
          danielbeck Daniel Beck added a comment - Jenkins core is not acting as expected jenkins.model.Jenkins.trimLabels() documents this specific issue ( Label.reset() does not, presumably because it's non-public). The problem is that, even if it's so accessible through the use of plugins, you are programming Jenkins using its internal API. So you cannot disregard internal structures or their constraints. Additionally, you're way beyond what would usually be possible to do with labels – so it's not unreasonable that your use case is not covered by Jenkins internals without some additional work and understanding on your part. If Jenkins wouldn't update effective labels after submitting the slave node config form, it'd be different.
          Hide
          krandino Kevin Randino added a comment -

          Fair enough, I suppose I am expecting more out of Jenkins than the average user would need, and the average case in which they would encounter this issue is nearly nil. I apologize for the false bug report, when working with Groovy changes I will stick to the proper mailing lists or IRC from now on. Sorry for the waste of time, your help is greatly appreciated though.

          Show
          krandino Kevin Randino added a comment - Fair enough, I suppose I am expecting more out of Jenkins than the average user would need, and the average case in which they would encounter this issue is nearly nil. I apologize for the false bug report, when working with Groovy changes I will stick to the proper mailing lists or IRC from now on. Sorry for the waste of time, your help is greatly appreciated though.
          Hide
          danielbeck Daniel Beck added a comment -

          Take a guess why I know so well how labels caching works

          So, yeah, Jenkins opts for providing access to the massive internal (plugin + Groovy scripts) API – basically everything it has, including the ugly parts – rather than a shiny facade that isn't nearly powerful enough. I think it's generally the right choice, but it comes with issues like this one.

          Note that the main problem here may be that using labels for something as dynamic as this looks like the wrong approach (as the caching indicates, they are thought of as fairly static). Check out the extension points modifying queue behavior (e.g. QueueTaskDispatcher), maybe what you're doing would be best implemented in a plugin?

          Show
          danielbeck Daniel Beck added a comment - Take a guess why I know so well how labels caching works So, yeah, Jenkins opts for providing access to the massive internal (plugin + Groovy scripts) API – basically everything it has, including the ugly parts – rather than a shiny facade that isn't nearly powerful enough. I think it's generally the right choice, but it comes with issues like this one. Note that the main problem here may be that using labels for something as dynamic as this looks like the wrong approach (as the caching indicates, they are thought of as fairly static). Check out the extension points modifying queue behavior (e.g. QueueTaskDispatcher), maybe what you're doing would be best implemented in a plugin?

            People

            • Assignee:
              Unassigned
              Reporter:
              krandino Kevin Randino
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: