Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60221

Copy artifact on agent sometimes hangs while copying

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Jenkins: 2.205

      master:
      - Windows 2012 R2
      - AdoptOpenJDK 1.8.0_232-b09 x64
      - started as service

      agent:
      - Windows 2012 R2
      - Java 1.8.0_211
      - started as scheduled task
    • Similar Issues:

      Description

      Sometimes a build executed on an agent will just spin while copying artifacts from another project.

       

      The build consists of steps:

      • get sources from source control
      • copy artifact from another project
      • copy artifact from another project

      In this case (after 13 hours) the first artifact has been partially copied (588 mb out of 633 mb) and the second artifact still needs to be copied (timestamp is old). The systemInfo url on the agent shows the following:

      Thread Dump

      HTTP Method Canceller

      "HTTP Method Canceller" Id=235 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.microsoft.tfs.core.ws.runtime.transport.HTTPConnectionCanceller.run(HTTPConnectionCanceller.java:116)

      Idle HTTP Connection Closer

      "Idle HTTP Connection Closer" Id=234 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.microsoft.tfs.core.ws.runtime.transport.IdleHTTPConnectionCloser.run(IdleHTTPConnectionCloser.java:143)

      IOHub#1: Selector[keys:0, gen:0] / pool-1-thread-1

      "IOHub#1: Selector[keys:0, gen:0] / pool-1-thread-1" Id=11 Group=main RUNNABLE (in native) at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method) at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source) at sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source) at sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source) at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source) - locked sun.nio.ch.Util$3@1cdd12e - locked java.util.Collections$UnmodifiableSet@7b7cd8 - locked sun.nio.ch.WindowsSelectorImpl@1ebadef at sun.nio.ch.SelectorImpl.select(Unknown Source) at sun.nio.ch.SelectorImpl.select(Unknown Source) at org.jenkinsci.remoting.protocol.IOHub.run(IOHub.java:461) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@12b2613

      main

      "main" Id=1 Group=main WAITING on hudson.remoting.Engine@1a6e70d at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Engine@1a6e70d at java.lang.Thread.join(Unknown Source) at java.lang.Thread.join(Unknown Source) at hudson.remoting.jnlp.Main.main(Main.java:277) at hudson.remoting.jnlp.Main._main(Main.java:270) at hudson.remoting.Launcher.run(Launcher.java:387) at hudson.remoting.Launcher.main(Launcher.java:298)

      MultiThreadedHttpConnectionManager cleanup

      "MultiThreadedHttpConnectionManager cleanup" Id=238 Group=main WAITING on java.lang.ref.ReferenceQueue$Lock@126ae38 at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.ReferenceQueue$Lock@126ae38 at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at com.microsoft.tfs.core.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1199)

      Ping thread for channel hudson.remoting.Channel@86080a:JNLP4-connect connection to nlc1devefejen01/10.2.113.181:49989

      "Ping thread for channel hudson.remoting.Channel@86080a:JNLP4-connect connection to nlc1devefejen01/10.2.113.181:49989" Id=21 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:94)

      pool-1-thread-1143 for JNLP4-connect connection to nlc1devefejen01/10.2.113.181:49989 id=26922

      "pool-1-thread-1143 for JNLP4-connect connection to nlc1devefejen01/10.2.113.181:49989 id=26922" Id=1179 Group=main RUNNABLE at sun.management.ThreadImpl.dumpThreads0(Native Method) at sun.management.ThreadImpl.dumpAllThreads(Unknown Source) at hudson.Functions.getThreadInfos(Functions.java:1295) at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:98) at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:95) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@878f15

      pool-1-thread-1144

      "pool-1-thread-1144" Id=1180 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1145

      "pool-1-thread-1145" Id=1181 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1146

      "pool-1-thread-1146" Id=1182 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1147

      "pool-1-thread-1147" Id=1183 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1148

      "pool-1-thread-1148" Id=1184 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1149

      "pool-1-thread-1149" Id=1185 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-1150

      "pool-1-thread-1150" Id=1186 Group=main TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.SynchronousQueue$TransferStack@16b204a at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

      pool-1-thread-3

      "pool-1-thread-3" Id=13 Group=main RUNNABLE (in native) at sun.nio.ch.SocketDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) - locked java.lang.Object@a4a1f9 at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35) at sun.nio.ch.ChannelInputStream.read(Unknown Source) at sun.nio.ch.ChannelInputStream.read(Unknown Source) at sun.nio.ch.ChannelInputStream.read(Unknown Source) - locked sun.nio.ch.ChannelInputStream@b7a057 at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown Source) - locked java.lang.Object@b226ae at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:245) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1a0714e

      RemoteInvocationHandler 2

      "RemoteInvocationHandler 2" Id=347 Group=main TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@199b483 at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.ReferenceQueue$Lock@199b483 at java.lang.ref.ReferenceQueue.remove(Unknown Source) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:598) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111) at java.lang.Thread.run(Unknown Source)

      Thread-1

      "Thread-1" Id=10 Group=main TIMED_WAITING on hudson.remoting.Channel@86080a at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Channel@86080a at hudson.remoting.Channel.join(Channel.java:1139) at hudson.remoting.Engine.innerRun(Engine.java:611) at hudson.remoting.Engine.run(Engine.java:488)

      Windows IOHub Watcher for IOHub#1: Selector[keys:0, gen:0] / pool-1-thread-2

      "Windows IOHub Watcher for IOHub#1: Selector[keys:0, gen:0] / pool-1-thread-2" Id=12 Group=main TIMED_WAITING on java.lang.Object@870f2e at java.lang.Object.wait(Native Method) - waiting on java.lang.Object@870f2e at org.jenkinsci.remoting.protocol.IOHub$IOHubSelectorWatcher.run(IOHub.java:549) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:97) at hudson.remoting.Engine$1$$Lambda$5/13931505.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@c466f4

      Attach Listener

      "Attach Listener" Id=5 Group=system RUNNABLE

      Finalizer

      "Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@1c92e83 at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.ReferenceQueue$Lock@1c92e83 at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

      Reference Handler

      "Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@e6d2b5 at java.lang.Object.wait(Native Method) - waiting on java.lang.ref.Reference$Lock@e6d2b5 at java.lang.Object.wait(Unknown Source) at java.lang.ref.Reference.tryHandlePending(Unknown Source) at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)

      Signal Dispatcher

      "Signal Dispatcher" Id=4 Group=system RUNNABLE

      Remote Class Loader Statistics

      Loading Type Time (s) Count
      Classes 12.0 2222 (prefetch cache: 1167)
      Resources 0.1 28

       

      After cancelling the build, the e-mail log shows:

      FATAL: Failed to copy file:/C:/jenkins/jobs/7.27/jobs/done/jobs/build_bl-11.7-sql/builds/7/archive/build/oe11.7/ef-bl.mss.pl to d:\jenkins\workspace\7.27\done\edis_11.7-sql2016\ef-bl.mss.pl
      java.lang.InterruptedException
      at java.lang.Object.wait(Native Method)
      at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:225)
      at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:124)
      Also: java.nio.channels.ClosedChannelException
      at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
      at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@471bde16:JNLP4-connect connection from 10.2.34.112/10.2.34.112:55730": channel is already closed
      at hudson.remoting.Channel.send(Channel.java:717)
      at hudson.remoting.ProxyOutputStream.doClose(ProxyOutputStream.java:174)
      at hudson.remoting.ProxyOutputStream.error(ProxyOutputStream.java:169)
      at hudson.remoting.ProxyOutputStream.close(ProxyOutputStream.java:160)
      at hudson.remoting.RemoteOutputStream.close(RemoteOutputStream.java:116)
      at hudson.FilePath.copyFrom(FilePath.java:985)
      Caused: java.io.InterruptedIOException
      at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:149)
      at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:108)
      at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315)
      at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270)
      at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291)
      at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246)
      at hudson.FilePath.copyFrom(FilePath.java:984)
      at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:617)
      Caused: java.io.IOException: Failed to copy file:/C:/jenkins/jobs/7.27/jobs/done/jobs/build_bl-11.7-sql/builds/7/archive/build/oe11.7/ef-bl.mss.pl to d:\jenkins\workspace\7.27\done\edis_11.7-sql2016\ef-bl.mss.pl
      at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:634)
      at hudson.plugins.copyartifact.CopyArtifact.copy(CopyArtifact.java:577)
      at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:537)
      at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:475)
      at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
      at hudson.model.Build$BuildExecution.build(Build.java:206)
      at hudson.model.Build$BuildExecution.doRun(Build.java:163)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
      at hudson.model.Run.execute(Run.java:1853)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      at hudson.model.ResourceController.execute(ResourceController.java:97)
      at hudson.model.Executor.run(Executor.java:427)

       

        Attachments

          Activity

          Hide
          ikedam ikedam added a comment -

          Stefan Drissen
          Could you attach the thread dump of the master node?
          It might sound strange, but who actually controls job executions is the master node.

          Show
          ikedam ikedam added a comment - Stefan Drissen Could you attach the thread dump of the master node? It might sound strange, but who actually controls job executions is the master node.
          Hide
          smd Stefan Drissen added a comment -

          Ah, drat. When it happens again I will add it. 

          Show
          smd Stefan Drissen added a comment - Ah, drat. When it happens again I will add it. 
          Hide
          smd Stefan Drissen added a comment -

          Right - back again - during copying build artifacts an agent went offline, upon being brought online again, the job is still hanging on the copy build artifact job. The threadDump from the master - for this build / agent - is:

          Executor #0 for nlc1dprm020 : executing 7.28/work/report_12.1-sql2017 #68

          "Executor #0 for nlc1dprm020 : executing 7.28/work/report_12.1-sql2017 #68" Id=79412 Group=main TIMED_WAITING on hudson.remoting.PipeWindow$Real@75b811f2 at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.PipeWindow$Real@75b811f2 at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:225) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:124) - locked hudson.remoting.ProxyOutputStream@4cfc2a8b at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:108) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) at hudson.FilePath.copyFrom(FilePath.java:984) at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:617) at hudson.plugins.copyartifact.CopyArtifact.copy(CopyArtifact.java:577) at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:537) at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:475) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741) at hudson.model.Build$BuildExecution.build(Build.java:206) at hudson.model.Build$BuildExecution.doRun(Build.java:163) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504) at hudson.model.Run.execute(Run.java:1853) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:427)

           

          I have saved the rest of the (rather large) threadDump - so can provide more if required.

          Show
          smd Stefan Drissen added a comment - Right - back again - during copying build artifacts an agent went offline, upon being brought online again, the job is still hanging on the copy build artifact job. The threadDump from the master - for this build / agent - is: Executor #0 for nlc1dprm020 : executing 7.28/work/report_12.1-sql2017 #68 "Executor #0 for nlc1dprm020 : executing 7.28/work/report_12.1-sql2017 #68" Id=79412 Group=main TIMED_WAITING on hudson.remoting.PipeWindow$Real@75b811f2 at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.PipeWindow$Real@75b811f2 at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:225) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:124) - locked hudson.remoting.ProxyOutputStream@4cfc2a8b at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:108) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2315) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) at hudson.FilePath.copyFrom(FilePath.java:984) at hudson.plugins.copyartifact.CopyArtifact.copyOne(CopyArtifact.java:617) at hudson.plugins.copyartifact.CopyArtifact.copy(CopyArtifact.java:577) at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:537) at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:475) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741) at hudson.model.Build$BuildExecution.build(Build.java:206) at hudson.model.Build$BuildExecution.doRun(Build.java:163) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504) at hudson.model.Run.execute(Run.java:1853) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:427)   I have saved the rest of the (rather large) threadDump - so can provide more if required.
          Hide
          ikedam ikedam added a comment -

          From the stacktrace, it looks an issue of hudson.remoting, and I changed the component of the ticket and expect someone in chage of remoting will be assigned.

          Show
          ikedam ikedam added a comment - From the stacktrace, it looks an issue of hudson.remoting, and I changed the component of the ticket and expect someone in chage of remoting will be assigned.
          Hide
          ikedam ikedam added a comment -

          Hmm... no one is assigned...

          Stefan Drissen

          > during copying build artifacts an agent went offline, upon being brought online again, the job is still hanging on the copy build artifact job

          This sounds really important to me as reproducing instructions make it easy to resolve the issue.
          I tried same operations but cannot reproduce the issue.
          Let me know followings as it might help resolving the issue
          (actually, I don't know much about the remoting library, but it would help someone who know much about remoting):

          • How do you launch agent with scheduled tasks? By executing java -jar agent.jar -jnlpUrl ... ?
          • Why did the agent go offline? Accidentally (e.g. the windows machine crashed), or by operation of an administrator?
          • How often is the issue reproduced if you make the agent offline during builds?
          • Does it hang if you make the agent offline before CopyArtifact starts?
          • Does it hang if you make the agent offline after CopyArtifact finished? (I suppose it won't)
          Show
          ikedam ikedam added a comment - Hmm... no one is assigned... Stefan Drissen > during copying build artifacts an agent went offline, upon being brought online again, the job is still hanging on the copy build artifact job This sounds really important to me as reproducing instructions make it easy to resolve the issue. I tried same operations but cannot reproduce the issue. Let me know followings as it might help resolving the issue (actually, I don't know much about the remoting library, but it would help someone who know much about remoting): How do you launch agent with scheduled tasks? By executing java -jar agent.jar -jnlpUrl ... ? Why did the agent go offline? Accidentally (e.g. the windows machine crashed), or by operation of an administrator? How often is the issue reproduced if you make the agent offline during builds? Does it hang if you make the agent offline before CopyArtifact starts? Does it hang if you make the agent offline after CopyArtifact finished? (I suppose it won't)
          Hide
          ikedam ikedam added a comment -

          And I believe you'd better to use Jenkins LTS versions (2.190.3 for now), as non-LTS versions are often unstable.
          I want to know if the issue occurs even with LTS versions.

          Show
          ikedam ikedam added a comment - And I believe you'd better to use Jenkins LTS versions (2.190.3 for now), as non-LTS versions are often unstable. I want to know if the issue occurs even with LTS versions.
          Hide
          smd Stefan Drissen added a comment - - edited
          1. the agent is launched as a windows scheduled task:
            • program: "C:\Program Files (x86)\Java\jre1.8.0_211\bin\java.exe"
            • arguments: -jar agent.jar -jnlpUrl http://<master>:8080/computer/nlc1dprm020/slave-agent.jnlp -secret <secret>
          2. the agent is running on a VM in our data center - if someone sneezes the VM may burp - in this case the agent was visibly offline in Jenkins - in the previous hang, the master did not consider the agent offline
          3. this occurs sporadically - usually I just restart the build or the agents - in this case I thought I would dig a bit deeper - when it occurs it is always at the copy artifact stage - the artifacts are three large-ish files (650 MB, 65 MB and 10 MB)
          4. disconnecting the agent from the UI has no effect, the job just continues - ending the scheduled task for the agent immediately throws a java.nio.channels.ClosedChannelException
          5. ending the scheduled task during CopyArtifacts results in the build hanging - the console from the master is spinning, the target artifact in the workspace of the agent has a partial size. 

          Since I'm already at 2.205, I will consider staying on an LTS version when the next one comes up.

          Show
          smd Stefan Drissen added a comment - - edited the agent is launched as a windows scheduled task: program: "C:\Program Files (x86)\Java\jre1.8.0_211\bin\java.exe" arguments: -jar agent.jar -jnlpUrl http://<master>:8080/computer/nlc1dprm020/slave-agent.jnlp -secret <secret> the agent is running on a VM in our data center - if someone sneezes the VM may burp - in this case the agent was visibly offline in Jenkins - in the previous hang, the master did not consider the agent offline this occurs sporadically - usually I just restart the build or the agents - in this case I thought I would dig a bit deeper -  when it occurs it is always at the copy artifact stage - the artifacts are three large-ish files (650 MB, 65 MB and 10 MB) disconnecting the agent from the UI has no effect, the job just continues - ending the scheduled task for the agent immediately throws a java.nio.channels.ClosedChannelException ending the scheduled task  during CopyArtifacts results in the build hanging - the console from the master is spinning, the target artifact in the workspace of the agent has a partial size.  Since I'm already at 2.205, I will consider staying on an LTS version when the next one comes up.

            People

            • Assignee:
              jthompson Jeff Thompson
              Reporter:
              smd Stefan Drissen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: