Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25803

Slave ping timeout ; build hangs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • None
    • LTS 1.565.2
      slave 2.48

      Slave stops responding for whatever reason.
      First ping thread notices slave does not respond:

      Nov 26, 2014 2:00:59 PM INFO hudson.slaves.ChannelPinger$1 onDead
      Ping failed. Terminating the channel.
      java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
      at hudson.remoting.PingThread.ping(PingThread.java:120)
      at hudson.remoting.PingThread.run(PingThread.java:81)

      (would be nice if we could see which channel/slave)

      A bit later:

      Nov 26, 2014 2:14:34 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
      Failed to monitor SV-ARG-DEV-D23 for Clock Difference
      hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:541)
      at hudson.remoting.Request.callAsync(Request.java:208)
      at hudson.remoting.Channel.callAsync(Channel.java:766)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:280)
      Caused by: java.io.IOException
      at hudson.remoting.Channel.close(Channel.java:1027)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
      at hudson.remoting.PingThread.ping(PingThread.java:120)
      at hudson.remoting.PingThread.run(PingThread.java:81)
      Caused by: java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
      ... 2 more

      Yet node is not shown as offline, and still shows as executor busy.
      Threaddump shows executor thread on master:
      Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164

      "Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164" Id=16260 Group=main TIMED_WAITING on hudson.remoting.UserRequest@e4ca0c4
      at java.lang.Object.wait(Native Method)

      • waiting on hudson.remoting.UserRequest@e4ca0c4
        at hudson.remoting.Request.call(Request.java:146)
        at hudson.remoting.Channel.call(Channel.java:739)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
        at com.sun.proxy.$Proxy56.join(Unknown Source)
        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
        at hudson.Launcher$ProcStarter.join(Launcher.java:367)
        at hudson.tasks.Maven.perform(Maven.java:328)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
        at hudson.model.Build$BuildExecution.build(Build.java:198)
        at hudson.model.Build$BuildExecution.doRun(Build.java:159)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
        at hudson.model.Run.execute(Run.java:1706)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:232)

      It looks like ChannelPinger calls channel.close() which sets channel.outClosed , yet Request is stuck in a loop checking for channel.isInClosed(), which was not set.
      Should Request check for channel.isClosingOrClosed()?

            Unassigned Unassigned
            wannessels Wannes Sels
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: