Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25656

SSH slaves plugin hangs the shutdown of Jenkins

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • ssh-slaves-plugin
    • None

      I've encountered a situation in which the Jenkins shutdown hangs at the following stacks that span across three threads:

      "Shutdown" prio=10 tid=0x00007f52b803b000 nid=0x6a2a waiting for monitor entry [0x00007f532157b000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at com.trilead.ssh2.Connection.getHostname(Connection.java:961)
      	- waiting to lock <0x000000060c4bfba8> (a com.trilead.ssh2.Connection)
      	at hudson.plugins.sshslaves.PluginImpl.closeRegisteredConnections(PluginImpl.java:47)
      	- locked <0x000000060c78cac8> (a java.lang.Class for hudson.plugins.sshslaves.PluginImpl)
      	at hudson.plugins.sshslaves.PluginImpl.stop(PluginImpl.java:38)
      	at hudson.PluginWrapper.stop(PluginWrapper.java:382)
      	at hudson.PluginManager.stop(PluginManager.java:650)
      	at jenkins.model.Jenkins.cleanUp(Jenkins.java:2695)
      	at hudson.WebAppMain.contextDestroyed(WebAppMain.java:344)
      	at org.mortbay.jetty.handler.ContextHandler.doStop(ContextHandler.java:545)
      	at org.mortbay.jetty.webapp.WebAppContext.doStop(WebAppContext.java:451)
      	at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:65)
      	at org.mortbay.jetty.handler.HandlerCollection.doStop(HandlerCollection.java:164)
      	at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:65)
      	at org.mortbay.jetty.handler.HandlerCollection.doStop(HandlerCollection.java:164)
      	at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:65)
      	at org.mortbay.jetty.handler.HandlerWrapper.doStop(HandlerWrapper.java:129)
      	at org.mortbay.jetty.Server.doStop(Server.java:242)
      	at org.mortbay.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:65)
      	at org.mortbay.jetty.Server$ShutdownHookThread.run(Server.java:509)
      
      
      
      "Jenkins-Remoting-Thread-8" daemon prio=10 tid=0x00007f52b54ad000 nid=0x698d waiting for monitor entry [0x00007f5322756000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at com.trilead.ssh2.channel.ChannelManager.closeChannel(ChannelManager.java:311)
      	- waiting to lock <0x000000060c4c29a0> (a java.lang.Object)
      	at com.trilead.ssh2.channel.ChannelManager.closeAllChannels(ChannelManager.java:278)
      	at com.trilead.ssh2.Connection.close(Connection.java:564)
      	at com.trilead.ssh2.Connection.close(Connection.java:558)
      	- locked <0x000000060c4bfba8> (a com.trilead.ssh2.Connection)
      	at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:611)
      	- locked <0x000000060c4bfa48> (a hudson.plugins.sshslaves.SSHLauncher)
      	at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:552)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:722)
      
      
      
      "Monitoring thread for Free Swap Space started on Mon Nov 17 16:53:46 PST 2014" daemon prio=10 tid=0x00007f5308005000 nid=0x69e6 runnable [0x00007f532167c000]
         java.lang.Thread.State: RUNNABLE
      	at java.net.SocketOutputStream.socketWrite0(Native Method)
      	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
      	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
      	at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
      	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
      	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
      	at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:666)
      	- locked <0x000000060c4bfe68> (a java.lang.Object)
      	at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:438)
      	- locked <0x000000060c4c29a0> (a java.lang.Object)
      	at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1857)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1766)
      	at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1273)
      	at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1227)
      	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1411)
      	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
      	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
      	at hudson.remoting.Command.writeTo(Command.java:83)
      	at hudson.remoting.ClassicCommandTransport.write(ClassicCommandTransport.java:52)
      	at hudson.remoting.Channel.send(Channel.java:528)
      	- locked <0x000000060c500428> (a hudson.remoting.Channel)
      	at hudson.remoting.Request.callAsync(Request.java:208)
      	at hudson.remoting.Channel.callAsync(Channel.java:749)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:280)
      

      The SSH server was on a VM that was gone. In this case clearly the TCP socket on the Jenkins master hasn't gotten the news, so it thinks it is still connected.

      It has been hang for a good 3 minutes before I "kill -9"ed it.

      I'm not sure what the right thing to do here. The socket write could block, and it can block for a long time before the sender decides that the connection is lost. The attempt to close in PluginImpl.stop() should probably do the shutdown in other threads so that the main thread can move on with the shutdown if it hangs.

            ifernandezcalvo Ivan Fernandez Calvo
            kohsuke Kohsuke Kawaguchi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: