-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
LTS 1.565.2
slave 2.48
Slave stops responding for whatever reason.
First ping thread notices slave does not respond:
Nov 26, 2014 2:00:59 PM INFO hudson.slaves.ChannelPinger$1 onDead
Ping failed. Terminating the channel.
java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
(would be nice if we could see which channel/slave)
A bit later:
Nov 26, 2014 2:14:34 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor SV-ARG-DEV-D23 for Clock Difference
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:541)
at hudson.remoting.Request.callAsync(Request.java:208)
at hudson.remoting.Channel.callAsync(Channel.java:766)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:280)
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1027)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
... 2 more
Yet node is not shown as offline, and still shows as executor busy.
Threaddump shows executor thread on master:
Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164
"Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164" Id=16260 Group=main TIMED_WAITING on hudson.remoting.UserRequest@e4ca0c4
at java.lang.Object.wait(Native Method)
- waiting on hudson.remoting.UserRequest@e4ca0c4
at hudson.remoting.Request.call(Request.java:146)
at hudson.remoting.Channel.call(Channel.java:739)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy56.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
at hudson.Launcher$ProcStarter.join(Launcher.java:367)
at hudson.tasks.Maven.perform(Maven.java:328)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.build(Build.java:198)
at hudson.model.Build$BuildExecution.doRun(Build.java:159)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1706)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:232)
It looks like ChannelPinger calls channel.close() which sets channel.outClosed , yet Request is stuck in a loop checking for channel.isInClosed(), which was not set.
Should Request check for channel.isClosingOrClosed()?