Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6598

Let Node and NodeProperty have more control over whether a node can run a task

    Details

    • Type: Patch
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None

      Description

      Right now, the only logic to determine whether a Node can run a particular Queue.Task is in JobOffer.canTake(Task). The logic is as follows:

      1. Check if the task has an assigned label; if it does and this node is not in the label, the node can't take the task
      2. If the task does not have an assigned label and this node only allows tied jobs (Mode.EXCLUSIVE), the node can't take the task
      3. If the node is offline or not accepting tasks, the node can't take the task

      I would like to add Node.canTake(Task) and NodeProperty.canTake(Task) methods. The JobOffer.canTake(Task) method would be changed to call Node.canTake(), moving checks #1 and #2 into the Node.canTake() implementation. Node.canTake() would then call NodeProperty.canTake(Task) on all of its assigned properties; if any of them return false, Node.canTake(Task) will also return false. The default implementation in the NodeProperty base class will return true.

      This allows Node subclasses and custom NodeProperties to control whether or not a particular Task should go to a particular Node, making it possible to do things like capabilities-based job assignment as opposed to the manually-intensive use of tying and node labels.

      I'm attaching a patch I've made to our internal copy of Hudson to make this change. I believe I have commit privileges to commit this if nobody objects to this change, otherwise I can get one of the other Yahoo! folks to do it.

        Activity

        mdillon mdillon created issue -
        Hide
        mdillon mdillon added a comment -

        I just wanted to point out one way this could be improved that I didn't include in the patch.

        As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". The problem is that the existing JobOffer.canTake() only returns a boolean, so the code assumes that if there is no assigned label for the job and it was not taken by a online node, then it must be waiting for an executor.

        One approach to fixing this would be to have Node.canTake(Task) and in turn NodeProperty.canTake(Task) return a CauseOfBlockage. I don't think that it's possible in general to use this CauseOfBlockage as the queue item tooltip, because that would involved folding together multiple CauseOfBlockage instances from all blocking nodes, but it would be possible to show a message like "Rejected by all available executors". The same thing could also be accomplished by adding a Node.getCauseOfBlockage(Task) method, but then BuildableItem.getCauseOfBlockage() would have to call it on all nodes and the settings of the node could have changed since canTake() was called.

        Show
        mdillon mdillon added a comment - I just wanted to point out one way this could be improved that I didn't include in the patch. As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". The problem is that the existing JobOffer.canTake() only returns a boolean, so the code assumes that if there is no assigned label for the job and it was not taken by a online node, then it must be waiting for an executor. One approach to fixing this would be to have Node.canTake(Task) and in turn NodeProperty.canTake(Task) return a CauseOfBlockage. I don't think that it's possible in general to use this CauseOfBlockage as the queue item tooltip, because that would involved folding together multiple CauseOfBlockage instances from all blocking nodes, but it would be possible to show a message like "Rejected by all available executors". The same thing could also be accomplished by adding a Node.getCauseOfBlockage(Task) method, but then BuildableItem.getCauseOfBlockage() would have to call it on all nodes and the settings of the node could have changed since canTake() was called.
        Hide
        abayer abayer added a comment -

        +1 - this would make JENKINS-6586 much, much easier. Well, ok, it'd make it work, is probably the more accurate way to put it, given the bizarre problems I'm having with dynamically adding/removing labels and the resulting changes not actually mattering in terms of whether a job gets run. The code looks good to me, and the functionality will be an excellent addition.

        The CauseOfBlockage stuff could either go in the same change as this, or perhaps more cleanly, a separate change. I'd tend towards the latter.

        Show
        abayer abayer added a comment - +1 - this would make JENKINS-6586 much, much easier. Well, ok, it'd make it work , is probably the more accurate way to put it, given the bizarre problems I'm having with dynamically adding/removing labels and the resulting changes not actually mattering in terms of whether a job gets run. The code looks good to me, and the functionality will be an excellent addition. The CauseOfBlockage stuff could either go in the same change as this, or perhaps more cleanly, a separate change. I'd tend towards the latter.
        Hide
        kohsuke Kohsuke Kawaguchi added a comment -

        For the JENKINS-6586 use case, this change by itself is not suffice. You'd need an extension point not scoped to a node, something like:

        interface QueueTaskDispatcher extends ExtensionPoint {
          boolean canTake(Node,Task);
        }
        

        Technically speaking, this would make it possible for custom Node implementations and NodeProperty implementations to insert the canTake logic without the proposed changes, although there's not much harm in leaving it in, either.

        Show
        kohsuke Kohsuke Kawaguchi added a comment - For the JENKINS-6586 use case, this change by itself is not suffice. You'd need an extension point not scoped to a node, something like: interface QueueTaskDispatcher extends ExtensionPoint { boolean canTake(Node,Task); } Technically speaking, this would make it possible for custom Node implementations and NodeProperty implementations to insert the canTake logic without the proposed changes, although there's not much harm in leaving it in, either.
        kohsuke Kohsuke Kawaguchi made changes -
        Field Original Value New Value
        Assignee kohsuke [ kohsuke ]
        Hide
        mdillon mdillon added a comment -

        I'd be happy with either approach. Dean Yu actually suggested an approach similar to this when we discussed the idea of involving Node and NodeProperty in the canTake decision. The reason I went with the approach I did was on analogy with the extensions that have been added the JobProperty over the years to allow it to participate more fully in the build lifecycle.

        Show
        mdillon mdillon added a comment - I'd be happy with either approach. Dean Yu actually suggested an approach similar to this when we discussed the idea of involving Node and NodeProperty in the canTake decision. The reason I went with the approach I did was on analogy with the extensions that have been added the JobProperty over the years to allow it to participate more fully in the build lifecycle.
        Hide
        scm_issue_link SCM/JIRA link daemon added a comment -

        Code changed in hudson
        User: : kohsuke
        Path:
        trunk/hudson/main/core/src/main/java/hudson/model/Node.java
        trunk/hudson/main/core/src/main/java/hudson/model/Queue.java
        trunk/hudson/main/core/src/main/java/hudson/model/queue/QueueTaskDispatcher.java
        trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProperty.java
        trunk/hudson/main/core/src/main/resources/hudson/model/Messages.properties
        trunk/hudson/main/test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java
        trunk/www/changelog.html
        http://jenkins-ci.org/commit/31304
        Log:
        [FIXED JENKINS-6598] applied a patch from Mike Dillon, plus the separate independent extension point. In 1.360.

        Show
        scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/model/Node.java trunk/hudson/main/core/src/main/java/hudson/model/Queue.java trunk/hudson/main/core/src/main/java/hudson/model/queue/QueueTaskDispatcher.java trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProperty.java trunk/hudson/main/core/src/main/resources/hudson/model/Messages.properties trunk/hudson/main/test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java trunk/www/changelog.html http://jenkins-ci.org/commit/31304 Log: [FIXED JENKINS-6598] applied a patch from Mike Dillon, plus the separate independent extension point. In 1.360.
        scm_issue_link SCM/JIRA link daemon made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        abayer abayer made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            kohsuke Kohsuke Kawaguchi
            Reporter:
            mdillon mdillon
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: