Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69890

Websocket Agent disconnection: possible deadlock at agent side

XMLWordPrintable

    • remoting:3071.v7e9b_0dc08466, 2.375.1

      Static agents using websockets are sporadically disconnected and they do not reconnect.
      Inspecting agent thread dumps, it is observed a potential deadlock:

      java.lang.Thread.State: WAITING (parking)
      at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
       - parking to wait for <0x00000000c7f230f0> (a java.util.concurrent.CountDownLatch$Sync)
      at java.util.concurrent.locks.LockSupport.park(java.base@11.0.16/LockSupport.java:194)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.16/AbstractQueuedSynchronizer.java:885)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.16/AbstractQueuedSynchronizer.java:1039)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.16/AbstractQueuedSynchronizer.java:1345)
      at java.util.concurrent.CountDownLatch.await(java.base@11.0.16/CountDownLatch.java:232)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusFuture.get(TyrusFuture.java:53)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint$Basic.processFuture(TyrusRemoteEndpoint.java:149)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint$Basic.sendBinary(TyrusRemoteEndpoint.java:131)
      at hudson.remoting.Engine$1AgentEndpoint$Transport.write(Engine.java:646)
      at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:303)
      at hudson.remoting.Channel.send(Channel.java:765)
       - locked <0x00000000c5f65b08> (a hudson.remoting.Channel)
      at hudson.remoting.Request$2.run(Request.java:389)
      at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
      at hudson.remoting.InterceptingExecutorService$$Lambda$90/0x000000084016c440.call(Unknown Source)
      at java.util.concurrent.FutureTask.run(java.base@11.0.16/FutureTask.java:264)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16/ThreadPoolExecutor.java:1128)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16/ThreadPoolExecutor.java:628)
      at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126)
      at hudson.remoting.Engine$1$$Lambda$91/0x000000084016b840.run(Unknown Source)
      at java.lang.Thread.run(java.base@11.0.16/Thread.java:829)
      Locked ownable synchronizers:
       - <0x00000000c7f231d0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

      And

      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Channel.terminate(Channel.java:1068)
       - waiting to lock <0x00000000c5f65b08> (a hudson.remoting.Channel)
      at hudson.remoting.Channel$1.terminate(Channel.java:620)
      at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:314)
      at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:629)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1235)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:110)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:481)
       - locked <0x00000000c5f6c7c0> (a io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler)
      at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:244)
      

      Even if the deadlock is not showing up clearly, it seems to be there. One thread is locking the `Channel` and waiting for some Tyrus stuff and the other is locking Tyrus stuff and waiting to lock the `Channel`.

            amuniz Antonio Muñiz
            amuniz Antonio Muñiz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: