Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5977

Dead lock condition due to pipe clogging

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None

      The way the remoting is written, a certain race condition can result in a dead lock.

      The root cause of this is that Channel.ReaderThread performs blocking write operation to OutputStream when it's executing the ProxyOutputStream.Chunk command. This synchronous behavior was necessary to ensure the proper in-order data arrival guarantee.

      Now, the situation is, the master initiates a FilePath.copyRecursiveTo from the master to a slave. This creates a piped stream pair over Channel, where the sender on the master are directly sending data over Channel, which gets to FastPipedOutputStream on the slave, then the forwarded Callable on the slave is reading from the corresponding FastPipedInputStream.

      The callable executing on the slave needs to load a class from the master, which results in this callable making an callback to the master to fetch a class file. But while this class file arrives as a Command, the master can send enough byte sequence to FastPipedOutputStream to the point that the buffer fills up and the write operation blocks.

      At this point, the class file is stuck in the network waiting to be read by the command reader thread, but that won't be able to do that until there'll be some space in the buffer, and there won't be any space in the buffer until the class file is retrieved, hence the dead lock.

        1. log.zip
          15 kB
        2. deadlockpatch.txt
          3 kB
        3. 34838_hang_on_archiving.txt
          18 kB
        4. 34838_hang_on_emma_publisher.txt
          11 kB
        5. HUDSON-5977.stack
          17 kB

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            27 Vote for this issue
            Watchers:
            26 Start watching this issue

              Created:
              Updated:
              Resolved: