Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-7667

Build waits for next available executor even if several are available

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: slave-squatter-plugin
    • Labels:
      None
    • Environment:
      Hudson 1.379, slave-squatter 1.1, heavy-job 1.0

      Description

      (Couldn't find slave-squatter nor heavy-job as components, so posted in core)
      I think there's a problem in the cooperation between these components.
      I started using them yesterday to replace a more complicated mix of "heavy-slaves".

      I have one slave with 8 executors. I configured it, using slave-squatter, to reserve 3 of these during office hours, leaving 5 for use. I then noticed a job with weight 3 claiming to wait for next available executor even though it was the only job in the queue and there were no builds on the slave (or anywhere, actually).
      I tried lowering the reservation of the slave to 2 (i.e. 6 executors free) which caused the build to start - but having to keep an eye on these situations is not why I run Hudson

      The only issue I could find that smells a bit like this was JENKINS-7033 (Job in build queue is not executed).

        Issue Links

          Activity

          Hide
          torbent torbent added a comment -

          They smell a bit similar, but are maybe not?

          Show
          torbent torbent added a comment - They smell a bit similar, but are maybe not?
          Hide
          torbent torbent added a comment -

          slave-squatter definitely is involved here.

          Without reservations:

          • a job of weight 2 will happily run on a 2-executor slave.

          With reservations:

          • a job of weight 2 will NOT run on a 2-executor slave.
          • a job of weight 1 will NOT run on a 1-executor slave.
          • a job of weight 1 WILL run on a 2-executor slave.
          • a job of weight 2 requires 4 available executors.
          • a job of weight 3 requires 6 available executors.

          Smells like a doubling, or something being subtracted twice?

          Show
          torbent torbent added a comment - slave-squatter definitely is involved here. Without reservations: a job of weight 2 will happily run on a 2-executor slave. With reservations: a job of weight 2 will NOT run on a 2-executor slave. a job of weight 1 will NOT run on a 1-executor slave. a job of weight 1 WILL run on a 2-executor slave. a job of weight 2 requires 4 available executors. a job of weight 3 requires 6 available executors. Smells like a doubling, or something being subtracted twice?
          Hide
          torbent torbent added a comment -

          It gets better (or worse).
          The slave (with 8 executors) was running some builds; there was a single executor available. I then tried manually requesting a build of a simple job (weight 1).
          This caused all idle executor threads inside the Hudson server to crash! The build executor status showed "Dead!" with a link to this report:

          java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
          at java.util.SubList.(Unknown Source)
          at java.util.RandomAccessSubList.(Unknown Source)
          at java.util.AbstractList.subList(Unknown Source)
          at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:311)
          at hudson.model.Queue.pop(Queue.java:753)
          at hudson.model.Executor.grabJob(Executor.java:175)
          at hudson.model.Executor.run(Executor.java:113)

          The running builds continued without problems to completion, and their threads appeared to stay functional. The dead threads were unsalvable and I had to restart Hudson.

          Show
          torbent torbent added a comment - It gets better (or worse). The slave (with 8 executors) was running some builds; there was a single executor available. I then tried manually requesting a build of a simple job (weight 1). This caused all idle executor threads inside the Hudson server to crash! The build executor status showed "Dead!" with a link to this report: java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.SubList.(Unknown Source) at java.util.RandomAccessSubList.(Unknown Source) at java.util.AbstractList.subList(Unknown Source) at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:311) at hudson.model.Queue.pop(Queue.java:753) at hudson.model.Executor.grabJob(Executor.java:175) at hudson.model.Executor.run(Executor.java:113) The running builds continued without problems to completion, and their threads appeared to stay functional. The dead threads were unsalvable and I had to restart Hudson.
          Hide
          jsiirola jsiirola added a comment -

          Simple patch that corrects the reported bug for the situation where a slave only has a single active reservation. It will not work if the slave has multiple overlapping reservations.

          Show
          jsiirola jsiirola added a comment - Simple patch that corrects the reported bug for the situation where a slave only has a single active reservation. It will not work if the slave has multiple overlapping reservations.
          Hide
          jsiirola jsiirola added a comment -

          The root cause is that active reservations are being double-counted: once by the hudson.plugins.slave_squatter.LoadPredictorImpl, and again by hudson.model.queue.LoadPredictor.CurrentlyRunningTasks. That leads to two separate failures: the first is that double-counting the reservations makes the scheduler think that the machine is busier than it actually is (the original ticket). The second is that the double counting can cause hudson.model.queue.MappingWorksheet to generate a maximum predicted load that is greater than the total number of executors, but does not clamp the prediction down to the actual total number of executors (see JENKINS-8882), which leads to the "Dead!" executors.

          I have a simple patch (attached) that works as long as the slave only ever has a single active reservation: it will fail if the slave has overlapping reservations. Basically, the patch is a hack that reports "negative load" for currently running reservations to counter-balance the extra load reported due to the double counting. A better approach would be to fundamentally redesign the slave-squatter's LoadPredictor API so as to only ever report predicted load and to exclude all currently-running reservations.

          Show
          jsiirola jsiirola added a comment - The root cause is that active reservations are being double-counted: once by the hudson.plugins.slave_squatter.LoadPredictorImpl, and again by hudson.model.queue.LoadPredictor.CurrentlyRunningTasks. That leads to two separate failures: the first is that double-counting the reservations makes the scheduler think that the machine is busier than it actually is (the original ticket). The second is that the double counting can cause hudson.model.queue.MappingWorksheet to generate a maximum predicted load that is greater than the total number of executors, but does not clamp the prediction down to the actual total number of executors (see JENKINS-8882 ), which leads to the "Dead!" executors. I have a simple patch ( attached ) that works as long as the slave only ever has a single active reservation: it will fail if the slave has overlapping reservations. Basically, the patch is a hack that reports "negative load" for currently running reservations to counter-balance the extra load reported due to the double counting. A better approach would be to fundamentally redesign the slave-squatter's LoadPredictor API so as to only ever report predicted load and to exclude all currently-running reservations.
          Hide
          jsiirola jsiirola added a comment -

          Assigning to Kohsuke (slave-squatter author) so he can comment on the best way to address the double-counting of reservations.

          Show
          jsiirola jsiirola added a comment - Assigning to Kohsuke (slave-squatter author) so he can comment on the best way to address the double-counting of reservations.
          Hide
          cpringle cpringle added a comment -

          I've just installed this plugin onto Jenkins and am getting the same behaviour as described in the initial bug report. I have 6 executors, but only 2 are available during the day. I currently have executors 2 and 6 free (the rest are reserved), and there are 2 jobs stuck in the build queue even though there are 2 spare executors.

          Are we able to get a fix for this?

          Show
          cpringle cpringle added a comment - I've just installed this plugin onto Jenkins and am getting the same behaviour as described in the initial bug report. I have 6 executors, but only 2 are available during the day. I currently have executors 2 and 6 free (the rest are reserved), and there are 2 jobs stuck in the build queue even though there are 2 spare executors. Are we able to get a fix for this?
          Hide
          efficks François-Xavier Choinière added a comment -

          Same problem here:
          I've a Jenkins server on Ubuntu 32 bit and a slave on Windows 7 64 bit.
          The slave connects correctly on the server.
          When starting the job, the status always says waiting the next executor on host <HOSTNAME>
          So... I'm unable to use any of my slave!

          Please fix this ASAP!!!

          Show
          efficks François-Xavier Choinière added a comment - Same problem here: I've a Jenkins server on Ubuntu 32 bit and a slave on Windows 7 64 bit. The slave connects correctly on the server. When starting the job, the status always says waiting the next executor on host <HOSTNAME> So... I'm unable to use any of my slave! Please fix this ASAP!!!
          Hide
          danielbeck Daniel Beck added a comment -

          Can this issue still be reproduced on recent Jenkins + plugins versions?

          Show
          danielbeck Daniel Beck added a comment - Can this issue still be reproduced on recent Jenkins + plugins versions?

            People

            • Assignee:
              kohsuke Kohsuke Kawaguchi
              Reporter:
              torbent torbent
            • Votes:
              4 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: