Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23177

All slave threads locked in hudson.remoting.PipeWindow$Real.get()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • Version 1.532.1

      We encountered today a situation where one of our slaves was totally locked.

      • Jobs would launch but get no futher than
      Building remotely on XXX in workspace YYY
       Starting build job ZZZ
      
      • No apparent problematic entries in the master log
      • Status showed the slave as online
      • No apparent problematic entries in the slave log, entries just stopped at the time when the problem started

      Taking a stack trace showed that all threads were stuck in the following stack frame (full stack trace attached)

      "pool-1-thread-10786" prio=3 tid=0x08461800 nid=0x4e43 in Object.wait() [0xb5088000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	- waiting on <0xbade43b0> (a hudson.remoting.PipeWindow$Real)
      	at java.lang.Object.wait(Object.java:485)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:177)
      	- locked <0xbade43b0> (a hudson.remoting.PipeWindow$Real)
      	at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:118)
      	- locked <0xbade43d8> (a hudson.remoting.ProxyOutputStream)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:103)
      	at hudson.Util.copyStream(Util.java:454)
      	at hudson.FilePath$28.call(FilePath.java:1623)
      	at hudson.FilePath$28.call(FilePath.java:1617)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:326)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:60)
      	at java.lang.Thread.run(Unknown Source)
      

      Looking at the code of PipeWindow$Real.get() it does not look totally impossible that threads get stuck in get() and never woken up if the pipe fills up. But I can't really point at a concrete problem.

      I checked the issues and found JENKINS-9540 and JENKINS-22807, but those seem different, with particular messages in the logs.

      Could this be a deadlock in the slave remoting code?

            Unassigned Unassigned
            jammann Joe Ammann
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: