Environment:Jenkins 1.397, heavy-job 1.0, slave-squatter trunk head (1.2-SNAPSHOT 10/07/2010)
RHEL 6 (java-1.6.0-openjdk 1.6.0_17)
We have been running the slave-squatter and heavy-jobs plugins since they were released under Hudson 1.377 without issue. Recently, we upgraded to Jenkins 1.397 and have started to run into situations where large numbers of executor threads die due to an exception thrown within hudson.model.queue.MappingWorksheet.
This behavior has been observed before in JENKINS-7667.
I believe the simple cause of the exception is that the load predictor routine in MappingWorksheet can come up with a maximum load that exceeds the number of available executors. When the code attempts to reduce the apparent executor pool (here), the underlying list throws the exception.
The simple solution is to clamp minIdle to 0 if it is less than 0.
That solution will prevent the exception that is killing executors (and is probably a good idea to implement if for no other reason than the load predictor is an extension point, and we cannot guarantee that implementers will always return <= the number of executors!). However, this does not address the underlying logical issue in the current load predictors (i.e., why is the predicted load greater than the number of executors???).