-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
1.651
In some experiments with scale testing, we've discovered that a truly massive queue (i.e., on the order of 1,000+ tasks in the queue) can result in delays for tasks to leave the queue and actually end up on an executor. In the example I tried out, I created a Pipeline job that forked out three parallel branches on nodes and then kicked that job off 1,000 times -
def branch(name) { return { node { sh "sleep 30 && head -c 52428800 /dev/urandom > ${name}.bin" //archive "${name}.bin" stash includes: "${name}.bin", name: "${name}" } } } stage "Thinking" for (i = 0; i < 5; i++) { sleep time: 250, unit: 'MILLISECONDS' echo "Thinking $i" } stage "Working" def branches = [:] branches["b1"] = branch("b1") branches["b2"] = branch("b2") branches["b3"] = branch("b3") parallel branches stage "Resting" for (i = 0; i < 5; i++) { sleep time: 150, unit: 'MILLISECONDS' echo "Resting $i" }
I kicked the builds off with a loop inside another Pipeline job calling the build step, so the initial population of the queue wasn't immediate - at first, all the tasks that entered the queue were able to be allocated to one of the 60 executors I had available for them more or less instantaneously. But once the queue was filled up, I added another 40 executors - by then, there were over 2,000 tasks in the queue and it took a couple minutes for those 40 executors to be allocated.
I then created another agent with 100 executors and added it - the queue was around 2,300 at that point and it again took a few minutes to fill the executors. Freed executors on the existing agent also took some time to fill up. I grabbed a thread dump while that was going on - it's attached.
I'm going to try this again with freestyle rather than Pipeline jobs once the build queue has cleaned up.
- links to