We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step what we see is that the console log has this line:
Description set: vap_current_iter-2012_03_29_19_01_03
And then nothing. Usually, it will look like this:
Description set: prod_pull-2012_03_28_19_01_03
Notifying upstream build armada_Launch_prod_pull #13 of job completion
Project armada_Launch_prod_pull still waiting for 1 builds to complete
Notifying upstream projects of job completion
Notifying upstream of completion: armada_Launch_prod_pull #13
I setup a logger for hudson.model.Run, and it currently has this :
Repeated for every hung slave.
The main hudson log doesn't have any additional information.
Disconnecting the slave has no effect.
Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown).
The only way we have found to recover is to kill -9 the tomcat process.
The tread dump for one of the slaves (they are all the same) is:
Any ideas on how to better recover or prevent this would be greatly appreciated.