Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5977

Dead lock condition due to pipe clogging

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None

      Description

      The way the remoting is written, a certain race condition can result in a dead lock.

      The root cause of this is that Channel.ReaderThread performs blocking write operation to OutputStream when it's executing the ProxyOutputStream.Chunk command. This synchronous behavior was necessary to ensure the proper in-order data arrival guarantee.

      Now, the situation is, the master initiates a FilePath.copyRecursiveTo from the master to a slave. This creates a piped stream pair over Channel, where the sender on the master are directly sending data over Channel, which gets to FastPipedOutputStream on the slave, then the forwarded Callable on the slave is reading from the corresponding FastPipedInputStream.

      The callable executing on the slave needs to load a class from the master, which results in this callable making an callback to the master to fetch a class file. But while this class file arrives as a Command, the master can send enough byte sequence to FastPipedOutputStream to the point that the buffer fills up and the write operation blocks.

      At this point, the class file is stuck in the network waiting to be read by the command reader thread, but that won't be able to do that until there'll be some space in the buffer, and there won't be any space in the buffer until the class file is retrieved, hence the dead lock.

      1. 34838_hang_on_archiving.txt
        18 kB
        glundh
      2. 34838_hang_on_emma_publisher.txt
        11 kB
        glundh
      3. deadlockpatch.txt
        3 kB
        glundh
      4. HUDSON-5977.stack
        17 kB
        olamy

        Issue Links

          Activity

          Hide
          esmalling Eric Smalling added a comment -

          I too am seeing the increase/decrease logs in 1.377. Too bad too since I need 1.379 to fix JENKINS-7546

          Show
          esmalling Eric Smalling added a comment - I too am seeing the increase/decrease logs in 1.377. Too bad too since I need 1.379 to fix JENKINS-7546
          Hide
          esmalling Eric Smalling added a comment -

          Unable to test JENKINS-7546 due to this issue in 1.379

          Show
          esmalling Eric Smalling added a comment - Unable to test JENKINS-7546 due to this issue in 1.379
          Hide
          swpalmer swpalmer added a comment - - edited

          I'm seeing the following exceptions after a successful build with Hudson 1.379. They seem to be related to the problems with the pipes to the slave nodes.

          FATAL: command execution failed
          hudson.util.IOException2: Failed to join the process
          at hudson.Proc$RemoteProc.join(Proc.java:355)
          at hudson.Launcher$ProcStarter.join(Launcher.java:280)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
          at hudson.model.Build$RunnerImpl.build(Build.java:174)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
          at hudson.model.Run.run(Run.java:1280)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:137)
          Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
          at hudson.remoting.Request$1.get(Request.java:218)
          at hudson.remoting.Request$1.get(Request.java:172)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
          at hudson.Proc$RemoteProc.join(Proc.java:347)
          ... 12 more
          Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
          at hudson.remoting.Request.abort(Request.java:257)
          at hudson.remoting.Channel.terminate(Channel.java:681)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:973)
          Caused by: java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:168)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
          FATAL: Unable to delete script file C:\DOCUME~1\Hudson\LOCALS~1\Temp\hudson8908395142501843194.bat
          hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\Hudson\LOCALS~1\Temp\hudson8908395142501843194.bat at hudson.remoting.Channel@15509e1:DCM-IG-01
          at hudson.FilePath.act(FilePath.java:749)
          at hudson.FilePath.act(FilePath.java:735)
          at hudson.FilePath.delete(FilePath.java:990)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
          at hudson.model.Build$RunnerImpl.build(Build.java:174)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
          at hudson.model.Run.run(Run.java:1280)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:137)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:467)
          at hudson.remoting.Request.call(Request.java:105)
          at hudson.remoting.Channel.call(Channel.java:630)
          at hudson.FilePath.act(FilePath.java:742)
          ... 13 more
          Caused by: java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:168)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
          FATAL: channel is already closed
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:467)
          at hudson.remoting.Request.call(Request.java:105)
          at hudson.remoting.Channel.call(Channel.java:630)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421)
          at hudson.model.Run.run(Run.java:1280)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:137)
          Caused by: java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:168)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:948)
          Page generated: Oct 5, 2010 9:11:10 AMHudson ver. 1.379

          Show
          swpalmer swpalmer added a comment - - edited I'm seeing the following exceptions after a successful build with Hudson 1.379. They seem to be related to the problems with the pipes to the slave nodes. FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:355) at hudson.Launcher$ProcStarter.join(Launcher.java:280) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1280) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:137) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request$1.get(Request.java:218) at hudson.remoting.Request$1.get(Request.java:172) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:347) ... 12 more Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.abort(Request.java:257) at hudson.remoting.Channel.terminate(Channel.java:681) at hudson.remoting.Channel$ReaderThread.run(Channel.java:973) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:948) FATAL: Unable to delete script file C:\DOCUME~1\Hudson\LOCALS~1\Temp\hudson8908395142501843194.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\Hudson\LOCALS~1\Temp\hudson8908395142501843194.bat at hudson.remoting.Channel@15509e1:DCM-IG-01 at hudson.FilePath.act(FilePath.java:749) at hudson.FilePath.act(FilePath.java:735) at hudson.FilePath.delete(FilePath.java:990) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1280) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:137) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:467) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:630) at hudson.FilePath.act(FilePath.java:742) ... 13 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:948) FATAL: channel is already closed hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:467) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:630) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421) at hudson.model.Run.run(Run.java:1280) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:137) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:948) Page generated: Oct 5, 2010 9:11:10 AMHudson ver. 1.379
          Hide
          kohsuke Kohsuke Kawaguchi added a comment -

          Logging issue is covered in JENKINS-7662 and is already fixed. The connection drop problem is not a deadlock problem, so please file a separate issue for that. Thanks!

          Show
          kohsuke Kohsuke Kawaguchi added a comment - Logging issue is covered in JENKINS-7662 and is already fixed. The connection drop problem is not a deadlock problem, so please file a separate issue for that. Thanks!
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: alanharder
          Path:
          src/main/java/hudson/plugins/copyartifact/FilePathCopyMethod.java
          http://jenkins-ci.org/commit/copyartifact-plugin/8b2ceaaeb1b58f6ab075e5cb260691c36e25fca4
          Log:
          Remove workaround code for JENKINS-5977 now that this issue is fixed
          in the minimum core version required by this plugin.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: alanharder Path: src/main/java/hudson/plugins/copyartifact/FilePathCopyMethod.java http://jenkins-ci.org/commit/copyartifact-plugin/8b2ceaaeb1b58f6ab075e5cb260691c36e25fca4 Log: Remove workaround code for JENKINS-5977 now that this issue is fixed in the minimum core version required by this plugin.

            People

            • Assignee:
              kohsuke Kohsuke Kawaguchi
              Reporter:
              kohsuke Kohsuke Kawaguchi
            • Votes:
              27 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: