Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5073

hudson.util.IOException2: Failed to join the process - on a Windows slave

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Windows XP master - Windows XP slave, Linux

      Description

      When executing a free-style job in a Windows agent, this error appers:

      //wc/psp/Branches/Release/PSP/content/EaTrax/MichaelFranti.wav#1 - added as P:\Release\PSP\content\EaTrax\MichaelFranti.wav
      //wc/psp/Branches/Release/PSP/content/EaTrax/MiikeSnow.wav#1 - added as P:\Release\PSPFATAL: command execution failed
      hudson.util.IOException2: Failed to join the process
      	at hudson.Proc$RemoteProc.join(Proc.java:311)
      	at hudson.Launcher$ProcStarter.join(Launcher.java:275)
      	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83)
      	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:577)
      	at hudson.model.Build$RunnerImpl.build(Build.java:165)
      	at hudson.model.Build$RunnerImpl.doRun(Build.java:133)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417)
      	at hudson.model.Run.run(Run.java:1176)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:123)
      Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException
      	at hudson.remoting.Request$1.get(Request.java:218)
      	at hudson.remoting.Request$1.get(Request.java:172)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
      	at hudson.Proc$RemoteProc.join(Proc.java:303)
      	... 12 more
      Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
      	at hudson.remoting.Request.abort(Request.java:257)
      	at hudson.remoting.Channel.terminate(Channel.java:594)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:872)
      Caused by: java.io.EOFException
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
      	at java.io.ObjectInputStream.readObject0(Unknown Source)
      	at java.io.ObjectInputStream.readObject(Unknown Source)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:852)
      FATAL: Unable to delete script file C:\DOCUME~1\pcfarm08\LOCALS~1\Temp\2\hudson2101488692771117152.bat
      hudson.util.IOException2: remote file operation failed
      	at hudson.FilePath.act(FilePath.java:677)
      	at hudson.FilePath.act(FilePath.java:665)
      	at hudson.FilePath.delete(FilePath.java:922)
      	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93)
      	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:577)
      	at hudson.model.Build$RunnerImpl.build(Build.java:165)
      	at hudson.model.Build$RunnerImpl.doRun(Build.java:133)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:417)
      	at hudson.model.Run.run(Run.java:1176)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:123)
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      	at hudson.remoting.Channel.send(Channel.java:408)
      	at hudson.remoting.Request.call(Request.java:105)
      	at hudson.remoting.Channel.call(Channel.java:547)
      	at hudson.FilePath.act(FilePath.java:672)
      	... 13 more
      [locks-and-latches] Releasing all the locks
      [locks-and-latches] All the locks released
      FATAL: channel is already closed
      hudson.remoting.ChannelClosedException: channel is already closed
      	at hudson.remoting.Channel.send(Channel.java:408)
      	at hudson.remoting.Request.call(Request.java:105)
      	at hudson.remoting.Channel.call(Channel.java:547)
      	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:734)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:422)
      	at hudson.model.Run.run(Run.java:1176)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:123)
      

      The job executes a batch command and it tooks 1h 17 min to fail in Hudson 1.336.

        Issue Links

          Activity

          Hide
          lacostej lacostej added a comment - - edited

          I was looking at the Proc class, and there seems to be a bug in the StdinCopyThread that could be cause the associated LocalProc to close its stream too early.

          See :

                      @Override
                      public void run() {
                          try {
                              try {
                                  byte[] buf = new byte[8192];
                                  int len;
                                  while ((len = in.read(buf)) > 0) {
                                      out.write(buf, 0, len);
                                      out.flush();
                                  }
                              } finally {
                                  in.close();
                                  out.close();
                              }
                          } catch (IOException e) {
                              // TODO: what to do?
                          }
                      }
          

          If the read returns 0, e.g. because of a buffering delay etc, then the while loop exists before we are done.

          Same for StreamCopyThread. There may be other occurences in the code. I will search for them.

          Not sure how these classes are used but it could definitively affect a long build, explaining the lack of reproducibility.

          I created JENKINS-8686 to track and deal with this issue. We will see if that solves this problem or not. Otherwise we might have to ask for some logs (there are a few well placed log calls in that part of the code).

          Show
          lacostej lacostej added a comment - - edited I was looking at the Proc class, and there seems to be a bug in the StdinCopyThread that could be cause the associated LocalProc to close its stream too early. See : @Override public void run() { try { try { byte [] buf = new byte [8192]; int len; while ((len = in.read(buf)) > 0) { out.write(buf, 0, len); out.flush(); } } finally { in.close(); out.close(); } } catch (IOException e) { // TODO: what to do ? } } If the read returns 0, e.g. because of a buffering delay etc, then the while loop exists before we are done. Same for StreamCopyThread. There may be other occurences in the code. I will search for them. Not sure how these classes are used but it could definitively affect a long build, explaining the lack of reproducibility. I created JENKINS-8686 to track and deal with this issue. We will see if that solves this problem or not. Otherwise we might have to ask for some logs (there are a few well placed log calls in that part of the code).
          Hide
          soundrabbit lukas rytz added a comment - - edited

          [Edit 2: We only had the issues with JNLP-Slaves. Instead, we now installed Cygwin+OpenSSH on the Windows slave and connect using the SSH Slaves plugin. This works fine.]

          We see similar problems.

          Master: Ubuntu x64, 2.6.32-29-server
          Slave: Windows 7 SP1 x64, VM running in VirtualBox 4.0.4, Java 1.6.0_24-x64 (also tried 32bit JVM, same problem).

          [Edit: i disabled antivirus, Windows Defender, UAC, I run the jenkins slave as administrator. These don't help.]

          We also have a Windows XP 32 bit VM, similar configuration, and there it never happens. On the Win7 slave it happens at every build.

          Here are 5 build, the stack traces are exactly identical 1 2 3 4 5

          It looks like it does not happen at random points, but at certain points in the build.

          FATAL: Unable to delete script file C:\Windows\TEMP\hudson182664989132439196.bat
          hudson.util.IOException2: remote file operation failed: C:\Windows\TEMP\hudson182664989132439196.bat at hudson.remoting.Channel@42728b7:win7
          	at hudson.FilePath.act(FilePath.java:753)
          	at hudson.FilePath.act(FilePath.java:739)
          	at hudson.FilePath.delete(FilePath.java:994)
          	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
          	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649)
          	at hudson.model.Build$RunnerImpl.build(Build.java:177)
          	at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
          	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:423)
          	at hudson.model.Run.run(Run.java:1362)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:145)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:466)
          	at hudson.remoting.Request.call(Request.java:105)
          	at hudson.remoting.Channel.call(Channel.java:630)
          	at hudson.FilePath.act(FilePath.java:746)
          	... 13 more
          Caused by: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:985)
          Caused by: java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:979)
          FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Request.call(Request.java:137)
          	at hudson.remoting.Channel.call(Channel.java:630)
          	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
          	at $Proxy26.join(Unknown Source)
          	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850)
          	at hudson.Launcher$ProcStarter.join(Launcher.java:336)
          	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
          	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649)
          	at hudson.model.Build$RunnerImpl.build(Build.java:177)
          	at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
          	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:423)
          	at hudson.model.Run.run(Run.java:1362)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:145)
          Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Request.abort(Request.java:257)
          	at hudson.remoting.Channel.terminate(Channel.java:681)
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1003)
          Caused by: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:985)
          Caused by: java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:979)
          
          Show
          soundrabbit lukas rytz added a comment - - edited [Edit 2: We only had the issues with JNLP-Slaves. Instead, we now installed Cygwin+OpenSSH on the Windows slave and connect using the SSH Slaves plugin. This works fine.] We see similar problems. Master: Ubuntu x64, 2.6.32-29-server Slave: Windows 7 SP1 x64, VM running in VirtualBox 4.0.4, Java 1.6.0_24-x64 (also tried 32bit JVM, same problem). [Edit: i disabled antivirus, Windows Defender, UAC, I run the jenkins slave as administrator. These don't help.] We also have a Windows XP 32 bit VM, similar configuration, and there it never happens. On the Win7 slave it happens at every build. Here are 5 build, the stack traces are exactly identical 1 2 3 4 5 It looks like it does not happen at random points, but at certain points in the build. FATAL: Unable to delete script file C:\Windows\TEMP\hudson182664989132439196.bat hudson.util.IOException2: remote file operation failed: C:\Windows\TEMP\hudson182664989132439196.bat at hudson.remoting.Channel@42728b7:win7 at hudson.FilePath.act(FilePath.java:753) at hudson.FilePath.act(FilePath.java:739) at hudson.FilePath.delete(FilePath.java:994) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649) at hudson.model.Build$RunnerImpl.build(Build.java:177) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:423) at hudson.model.Run.run(Run.java:1362) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:466) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:630) at hudson.FilePath.act(FilePath.java:746) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:985) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:979) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:137) at hudson.remoting.Channel.call(Channel.java:630) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy26.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850) at hudson.Launcher$ProcStarter.join(Launcher.java:336) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649) at hudson.model.Build$RunnerImpl.build(Build.java:177) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:423) at hudson.model.Run.run(Run.java:1362) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:257) at hudson.remoting.Channel.terminate(Channel.java:681) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1003) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:985) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:979)
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/hudson/tasks/CommandInterpreter.java
          http://jenkins-ci.org/commit/jenkins/1b22559581abf87344c57ac0dcf21d1612f2c80b
          Log:
          JENKINS-5073

          Suppressed a confusing 2nd error message.

          Compare: https://github.com/jenkinsci/jenkins/compare/786c1d899367...1b22559581ab

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/hudson/tasks/CommandInterpreter.java http://jenkins-ci.org/commit/jenkins/1b22559581abf87344c57ac0dcf21d1612f2c80b Log: JENKINS-5073 Suppressed a confusing 2nd error message. Compare: https://github.com/jenkinsci/jenkins/compare/786c1d899367...1b22559581ab
          Hide
          dogfood dogfood added a comment -

          Integrated in jenkins_main_trunk #2887
          JENKINS-5073 (Revision 1b22559581abf87344c57ac0dcf21d1612f2c80b)

          Result = SUCCESS
          kohsuke : 1b22559581abf87344c57ac0dcf21d1612f2c80b
          Files :

          • core/src/main/java/hudson/tasks/CommandInterpreter.java
          Show
          dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2887 JENKINS-5073 (Revision 1b22559581abf87344c57ac0dcf21d1612f2c80b) Result = SUCCESS kohsuke : 1b22559581abf87344c57ac0dcf21d1612f2c80b Files : core/src/main/java/hudson/tasks/CommandInterpreter.java
          Hide
          jglick Jesse Glick added a comment -

          Essentially a duplicate of JENKINS-1948.

          Show
          jglick Jesse Glick added a comment - Essentially a duplicate of JENKINS-1948 .

            People

            • Assignee:
              Unassigned
              Reporter:
              deccico deccico
            • Votes:
              45 Vote for this issue
              Watchers:
              43 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: