-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins: 1.565.2 (Also seen in 1.532.2)
Slave: Windows server 2008R2 64bit
slave.jar version: 2.43 (also seen in 2.32 when running master on 1.532.2)
java -version on slave:
java version "1.6.0"
Java(TM) SE Runtime Environment (build pwa6460sr10fp1ifix-20120429_01(SR10 P1+IV20338))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows Server 2008 R2 amd64-64 jvmwa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled)
J9VM - 20120202_101568
JIT - r9_20111107_21307ifx1
GC - 20120202_AA)
JCL - 20120429_01
Jenkins: 1.565.2 (Also seen in 1.532.2) Slave: Windows server 2008R2 64bit slave.jar version: 2.43 (also seen in 2.32 when running master on 1.532.2) java -version on slave: java version "1.6.0" Java(TM) SE Runtime Environment (build pwa6460sr10fp1ifix-20120429_01(SR10 P1+IV20338)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Windows Server 2008 R2 amd64-64 jvmwa6460sr10fp1-20120202_101568 (JIT enabled, AOT enabled) J9VM - 20120202_101568 JIT - r9_20111107_21307ifx1 GC - 20120202_AA) JCL - 20120429_01
Periodically we experience the following scenario:
- Slave is restarted (usually after the box it is on is restarted)
- Slave connects to Jenkins, Jenkins runs a job on it
- Job starts but never gets past 'building remotely on...' (we've left it for 3+ hours when the job usually takes 1 minute to run)
We have to manually kill the slave process and restart it to be able to actually use the slave.
We've seen this on several slave machines with the same OS and java version as above but it doesn't happen every time we restart. Thread dump of the slave when it's in this state is attached.
The job console log never gets past this:
Started by upstream project "START" build number 159
originally caused by:
Started by upstream project "SCH_RESTART_JAT" build number 18
originally caused by:
Started by user XXXX
Building remotely on INTERNAL_JAT_REP (jat) in workspace c:\jenkins_slave\workspace\CHECK_JENKINS_SLAVE
The node log contents looks ok:
JNLP agent connected from /XXX.XXX.XXX.XXX
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.43
This is a Windows slave
Effective SlaveRestarter on INTERNAL_JAT_REP: []
Slave successfully connected and online
The slave is connected using JNLP which we run from a scheduled task on system startup.
It seems similar to this issue (although we can cancel the job within the jenkins ui before we terminate the slave) but this is from years ago and didn't mention a resolution: http://jenkins-ci.361315.n4.nabble.com/Job-is-getting-stuck-before-it-can-even-start-building-td3053144.html