We have recently added some new jobs to our Jenkins instance which are scheduled to run periodically to shut down some virtual machines used by some of our automated tests, restore the VMs back to a clean snapshot, and bring the VMs back online. During this process we want to prevent any unit tests that need said VMs from running since they would simply fail because the services they need would be offline.
Currently our solution has been to setup a throttled build category, assign that category to every job that requires those VMs as well as the jobs that perform the automated maintenance, setting the category such that only one of the jobs therein may run anywhere on our build farm at a time. So if a unit test is currently using one of the VMs when the maintenance job kicks in, the latter will wait until the test completes before performing the cleanup. Conversely, if the cleanup job is running when a unit test triggers that depends on the target VM it will block until the cleanup is complete ensuring all necessary services are online before running the tests.
This works fine as described however the problem we have is that the unit test jobs that make use of these VMs may run in parallel to one another since the tests themselves are independent from on another. Using the throttling plugin as described not only prevents the tests from running while the cleanup operation is running but it also prevents the tests from running side-by-side themselves.
Now, on the one hand we probably could use the build blocker plugin to configure our unit test jobs such that they won't run while the cleanup operation is running, but using this plugin for the inverse case is cumbersome at best. Suppose we have 10 unit test jobs using a VM, and 1 job which manages the cleanup of the VM. All 10 test jobs can easily be set up to not run when the single cleanup job is running but making sure that all the relevant CI jobs are included in the cleanup job so they don't run in parallel is problematic at best. As new jobs are added, jobs renamed, etc. we would have to remember to constantly keep the cleanup job up to date. As human error always rears it's ugly head sooner or later someone will forget to do so which could result in intermittent, difficult to debug test failures - something we try to avoid.
Ideally what I'd like to see is some option which would allow us to put all of our unit test jobs into a category and then have that entire group of jobs able to run in parallel but all of them block on the VM cleanup job and vice versa. Then all we'd need to do is ensure all jobs that make use of a particular VM be placed in the correct category and Jenkins could then sort out the rest automatically, similar to the way the throttling plugin currently works.
To make the solution a bit more generic, maybe the throttling plugin could be extended such that "groups" of jobs could be defined, and then any jobs in group #1 will block when any jobs from group #2 are running and vice versa, but jobs within the same group are left free to run in parallel. Then we could add all of our unit test jobs to group 1, and then put just the one VM cleanup job in group #2 and presto! No extra work required. Then, if we wanted to add more "VM maintenance" like jobs we could simply put them in group #2 with the cleanup job, or put them in their own independent groups, depending on the desired functionality.