Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5934

Job hangs on copying artifacts build step if artifacts exist

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: copyartifact-plugin
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      I just tried the copy-artifact plugin and added it to an existing job which checks out from git, pulls in some artifacts (previously using `curl`) and then runs a script.

      Config:

      Artifacts to copy:  **/bin/*.apk
      Stable build only:  Yes
      Target directory:   <blank>
      

      Now the job hangs for five minutes before failing with the following stacktrace:

      ERROR: Failed to copy artifacts from MBP_1.0_APK with filter: **/bin/*.apk
      hudson.util.IOException2: hudson.remoting.RequestAbortedException: java.io.EOFException
      	at hudson.FilePath.copyRecursiveTo(FilePath.java:1467)
      	at hudson.FilePath.copyRecursiveTo(FilePath.java:1393)
      	at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:117)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582)
      	at hudson.model.Build$RunnerImpl.build(Build.java:165)
      	at hudson.model.Build$RunnerImpl.doRun(Build.java:132)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
      	at hudson.model.Run.run(Run.java:1240)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:122)
      Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException
      	at hudson.remoting.Request$1.get(Request.java:218)
      	at hudson.remoting.Request$1.get(Request.java:172)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
      	at hudson.FilePath.copyRecursiveTo(FilePath.java:1465)
      	... 11 more
      Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
      	at hudson.remoting.Request.abort(Request.java:257)
      	at hudson.remoting.Channel.terminate(Channel.java:598)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:876)
      Caused by: java.io.EOFException
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:856)
      FATAL: channel is already closed
      hudson.remoting.ChannelClosedException: channel is already closed
      	at hudson.remoting.Channel.send(Channel.java:412)
      	at hudson.remoting.Request.call(Request.java:105)
      	at hudson.remoting.Channel.call(Channel.java:551)
      	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:735)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421)
      	at hudson.model.Run.run(Run.java:1240)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:122)
      

      Without using the copy-artifact plugin, these same jobs on the same slaves run without problem, including archiving and fingerprinting artifacts.

      I also tried setting the path without wildcards, pointing directly to one of the artifacts I want to copy, but it still hangs. The job does not hang if I change the "Artifacts to copy" to a non-existent path; it just says "Copied 0 artifacts from <job>".


      If I abort the job before this five minute timeout, then the job is left hanging on "Archiving artifacts". Pressing abort again hangs on "Recording fingerprints". Aborting a third time stops the job.

      The stack trace for this case is below:

      Recording changes in branch origin/master
      [artifact_test] $ git log --pretty=format:%H f2c3a8ecb5cccfde2bf372c7e3c32b6d9754d9d8..f2c3a8ecb5cccfde2bf372c7e3c32b6d9754d9d8
      [artifact_test] $ /bin/sh -xe /tmp/hudson29151.sh
      + echo Before artifact copy task...
      Before artifact copy task...
      Build was aborted
      Archiving artifacts
      ERROR: Publisher hudson.tasks.ArtifactArchiver aborted due to exception
      java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.Request.call(Request.java:122)
      	at hudson.remoting.Channel.call(Channel.java:551)
      	at hudson.EnvVars.getRemote(EnvVars.java:196)
      	at hudson.model.Computer.getEnvironment(Computer.java:728)
      	at hudson.model.Run.getEnvironment(Run.java:1627)
      	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:663)
      	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:116)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582)
      	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563)
      	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550)
      	at hudson.model.Build$RunnerImpl.post2(Build.java:152)
      	at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528)
      	at hudson.model.Run.run(Run.java:1247)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:122)
      Recording fingerprints
      ERROR: Publisher hudson.tasks.Fingerprinter aborted due to exception
      java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.Request.call(Request.java:122)
      	at hudson.remoting.Channel.call(Channel.java:551)
      	at hudson.FilePath.act(FilePath.java:736)
      	at hudson.FilePath.act(FilePath.java:729)
      	at hudson.tasks.Fingerprinter.record(Fingerprinter.java:163)
      	at hudson.tasks.Fingerprinter.perform(Fingerprinter.java:114)
      	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582)
      	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563)
      	at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550)
      	at hudson.model.Build$RunnerImpl.post2(Build.java:152)
      	at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528)
      	at hudson.model.Run.run(Run.java:1247)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:122)
      Finished: ABORTED
      

      Let me know if there's any other info I can provide.

        Attachments

          Activity

          Hide
          mindless Alan Harder added a comment -

          My plugin is basically a UI+buildstep wrapper for the FilePath.copyRecursiveTo core API.. none of the "real work" is done in the plugin. So I asked Kohsuke what to make of all this and he suggested:

          I suspect the slave JVM has exploded
          5 mins must be the time VM spent doing GC after GC before it quit
          I think you should ask the guy to look at the slave log

          Can you find anything in the logs for these slave nodes?

          Show
          mindless Alan Harder added a comment - My plugin is basically a UI+buildstep wrapper for the FilePath.copyRecursiveTo core API.. none of the "real work" is done in the plugin. So I asked Kohsuke what to make of all this and he suggested: I suspect the slave JVM has exploded 5 mins must be the time VM spent doing GC after GC before it quit I think you should ask the guy to look at the slave log Can you find anything in the logs for these slave nodes?
          Hide
          orrc Christopher Orr added a comment -

          I don't know if there's a way to get more verbose slave output, but this is all I see in the slave log, 5 minutes and 6 seconds after the job was started:

          Ping failed. Terminating
          ERROR: [03/16/10 19:20:26] slave agent was terminated
          java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Channel$ReaderThread.run(Channel.java:856)
          [03/16/10 19:20:30] Launching slave agent
          

          I just tried this again on a different slave and it too killed itself (with the same log message as above) but this time in a mere 2 minutes, 34 seconds. The master log doesn't show anything interesting; just the same stack trace as the slave, accompanying a SEVERE error message.

          Mar 16, 2010 7:33:31 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel slave-0134
          

          For the record, the copy-artifact build did run successfully once earlier today after upgrading to 1.351. But not again since.

          Show
          orrc Christopher Orr added a comment - I don't know if there's a way to get more verbose slave output, but this is all I see in the slave log, 5 minutes and 6 seconds after the job was started: Ping failed. Terminating ERROR: [03/16/10 19:20:26] slave agent was terminated java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Channel$ReaderThread.run(Channel.java:856) [03/16/10 19:20:30] Launching slave agent I just tried this again on a different slave and it too killed itself (with the same log message as above) but this time in a mere 2 minutes, 34 seconds. The master log doesn't show anything interesting; just the same stack trace as the slave, accompanying a SEVERE error message. Mar 16, 2010 7:33:31 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel slave-0134 For the record, the copy-artifact build did run successfully once earlier today after upgrading to 1.351. But not again since.
          Hide
          raspy Krzysztof Malinowski added a comment - - edited

          Had a similar problem when trying to write similar plugin to yours Anyway it seems that copyRecursiveTo hangs when copying from master to remote node. When using your plugin copying hanged and some time after it reported:

          ERROR: Failed to copy artifacts from BOX_SDL-build with filter: **
          hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:412)
          	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:112)
          	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:100)
          	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
          	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
          	at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:161)
          	at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:118)
          	at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72)
          	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
          	at org.apache.tools.tar.TarBuffer.writeBlock(TarBuffer.java:410)
          	at org.apache.tools.tar.TarBuffer.writeRecord(TarBuffer.java:351)
          	at hudson.org.apache.tools.tar.TarOutputStream.writeEOFRecord(TarOutputStream.java:356)
          	at hudson.org.apache.tools.tar.TarOutputStream.finish(TarOutputStream.java:137)
          	at hudson.org.apache.tools.tar.TarOutputStream.close(TarOutputStream.java:149)
          	at hudson.FilePath$TarWriter.close(FilePath.java:1586)
          	at hudson.FilePath.writeToTar(FilePath.java:1661)
          	at hudson.FilePath.copyRecursiveTo(FilePath.java:1463)
          	at hudson.FilePath.copyRecursiveTo(FilePath.java:1393)
          	at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:117)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582)
          	at hudson.model.Build$RunnerImpl.build(Build.java:165)
          	at hudson.model.Build$RunnerImpl.doRun(Build.java:132)
          	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
          	at hudson.model.Run.run(Run.java:1240)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:122)
          FATAL: channel is already closed
          hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:412)
          	at hudson.remoting.Request.call(Request.java:105)
          	at hudson.remoting.Channel.call(Channel.java:551)
          	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:735)
          	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421)
          	at hudson.model.Run.run(Run.java:1240)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:122)
          

          I did some investigation around this part of Hudson core when trying to write a plugin and it seemed that files were actually copied, but after the copying was complete the process hanged on a synchronization object. I think some help from core team is required to resolve that.

          Show
          raspy Krzysztof Malinowski added a comment - - edited Had a similar problem when trying to write similar plugin to yours Anyway it seems that copyRecursiveTo hangs when copying from master to remote node. When using your plugin copying hanged and some time after it reported: ERROR: Failed to copy artifacts from BOX_SDL-build with filter: ** hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:112) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:100) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:161) at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:118) at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:72) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at org.apache.tools.tar.TarBuffer.writeBlock(TarBuffer.java:410) at org.apache.tools.tar.TarBuffer.writeRecord(TarBuffer.java:351) at hudson.org.apache.tools.tar.TarOutputStream.writeEOFRecord(TarOutputStream.java:356) at hudson.org.apache.tools.tar.TarOutputStream.finish(TarOutputStream.java:137) at hudson.org.apache.tools.tar.TarOutputStream.close(TarOutputStream.java:149) at hudson.FilePath$TarWriter.close(FilePath.java:1586) at hudson.FilePath.writeToTar(FilePath.java:1661) at hudson.FilePath.copyRecursiveTo(FilePath.java:1463) at hudson.FilePath.copyRecursiveTo(FilePath.java:1393) at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:117) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582) at hudson.model.Build$RunnerImpl.build(Build.java:165) at hudson.model.Build$RunnerImpl.doRun(Build.java:132) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1240) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) FATAL: channel is already closed hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:551) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:735) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421) at hudson.model.Run.run(Run.java:1240) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:122) I did some investigation around this part of Hudson core when trying to write a plugin and it seemed that files were actually copied, but after the copying was complete the process hanged on a synchronization object. I think some help from core team is required to resolve that.
          Hide
          mindless Alan Harder added a comment -

          I did test copy-to-slave before releasing the plugin, and it seemed to work fine.. this was on a Mac and the slave was on localhost. Today I tried 1.343 on Solaris in SJSWS container with remote slave, and I DID see this behavior. Kohsuke said the slave is asking the master to send a class file..
          But then I tried current svn using hudson-dev:run, and 1.343 using java -jar and both worked fine. Finally, I tried the build of current svn in SJSWS and it hung again..

          SO, I suspect if you use java -jar instead of Tomcat that it should work.. now, why does running in a container make the slave hang until it is killed?

          Show
          mindless Alan Harder added a comment - I did test copy-to-slave before releasing the plugin, and it seemed to work fine.. this was on a Mac and the slave was on localhost. Today I tried 1.343 on Solaris in SJSWS container with remote slave, and I DID see this behavior. Kohsuke said the slave is asking the master to send a class file.. But then I tried current svn using hudson-dev:run, and 1.343 using java -jar and both worked fine. Finally, I tried the build of current svn in SJSWS and it hung again.. SO, I suspect if you use java -jar instead of Tomcat that it should work.. now, why does running in a container make the slave hang until it is killed?
          Hide
          mindless Alan Harder added a comment -

          See JENKINS-5977 for core problem we think causes this.

          Show
          mindless Alan Harder added a comment - See JENKINS-5977 for core problem we think causes this.
          Hide
          mindless Alan Harder added a comment -

          ..and that problem is related to a buffer filling up. So I take back what I said about this being a problem only when running in SJSWS/Tomcat.. it was just that the testing I did in my dev installs used small artifacts. In my SJSWS install I have some real (ie larger) artifacts that do trigger this problem.

          Show
          mindless Alan Harder added a comment - ..and that problem is related to a buffer filling up. So I take back what I said about this being a problem only when running in SJSWS/Tomcat.. it was just that the testing I did in my dev installs used small artifacts. In my SJSWS install I have some real (ie larger) artifacts that do trigger this problem.
          Hide
          mindless Alan Harder added a comment -

          I have a workaround that seems to resolve the issue, so this plugin can be used on Hudson releases before whatever-version-fixes-JENKINS-5977. I'll release this change tomorrow.

          Show
          mindless Alan Harder added a comment - I have a workaround that seems to resolve the issue, so this plugin can be used on Hudson releases before whatever-version-fixes- JENKINS-5977 . I'll release this change tomorrow.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : mindless
          Path:
          trunk/hudson/plugins/copyartifact/src/main/java/hudson/plugins/copyartifact/CopyArtifact.java
          trunk/hudson/plugins/copyartifact/src/main/webapp/JENKINS-5977/deleteme.tmp
          http://jenkins-ci.org/commit/28825
          Log:
          [copyartifact] [FIXED JENKINS-5934] Add workaround for JENKINS-5977
          so slaves won't hang on copy of larger artifacts.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : mindless Path: trunk/hudson/plugins/copyartifact/src/main/java/hudson/plugins/copyartifact/CopyArtifact.java trunk/hudson/plugins/copyartifact/src/main/webapp/ JENKINS-5977 /deleteme.tmp http://jenkins-ci.org/commit/28825 Log: [copyartifact] [FIXED JENKINS-5934] Add workaround for JENKINS-5977 so slaves won't hang on copy of larger artifacts.

            People

            • Assignee:
              mindless Alan Harder
              Reporter:
              orrc Christopher Orr
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: