Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61314

EC2 Plugin: Windows java.io.IOException: Pipe Closed

XMLWordPrintable

      We are seeing 2 occurrences of a java.io.IOException when using Windows nodes as agents.
      1. When new nodes are being spun up for Windows jobs, it appears that Jenkins will assign a newly spun up node (after establishing a connection) to service the build, but the connection gets terminated before the build even begins.

      Started by user some_user
      Running as SYSTEM
      Building remotely on EC2 (amazon-ec2) - windows-label (i-0de1740c2107602b3) (windows-label) in workspace D:\dev\jenkins\workspace\WindowsStressTests\SimpleWindowsBuild11
      FATAL: java.io.IOException: Pipe is already closed
      hudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at...
      	at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:100)
      	at hudson.remoting.FastPipedOutputStream.close(FastPipedOutputStream.java:90)
      	at hudson.plugins.ec2.util.Closeables.closeQuietly(Closeables.java:23)
      	at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:146)
      Caused: java.io.IOException: Pipe is already closed
      	at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:154)
      	at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:138)
      	at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
      	at hudson.remoting.ChunkedOutputStream.drain(ChunkedOutputStream.java:85)
      	at hudson.remoting.ChunkedOutputStream.write(ChunkedOutputStream.java:54)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:45)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
      	at hudson.remoting.Channel.send(Channel.java:721)
      	at hudson.remoting.Channel.close(Channel.java:1436)
      Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (amazon-ec2) - windows-label (i-0de1740c2107602b3)
      		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
      		at hudson.remoting.Request.call(Request.java:202)
      		at hudson.remoting.Channel.call(Channel.java:954)
      		at hudson.FilePath.act(FilePath.java:1069)
      		at hudson.FilePath.act(FilePath.java:1058)
      		at hudson.FilePath.mkdirs(FilePath.java:1243)
      		at hudson.model.AbstractProject.checkout(AbstractProject.java:1199)
      		at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
      		at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
      		at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
      		at hudson.model.Run.execute(Run.java:1853)
      		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      		at hudson.model.ResourceController.execute(ResourceController.java:97)
      		at hudson.model.Executor.run(Executor.java:427)
      Caused: hudson.remoting.RequestAbortedException
      	at hudson.remoting.Request.abort(Request.java:340)
      	at hudson.remoting.Channel.terminate(Channel.java:1038)
      	at hudson.remoting.Channel.close(Channel.java:1444)
      	at hudson.remoting.Channel.close(Channel.java:1403)
      	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:843)
      	at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:108)
      	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:734)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      

      Afterwards, the node is left in an offline state with the message "Connection was broken" until we manually reconnect it (via the "Launch Agent" button in the Jenkins UI). We are consistently see this problem. 

      Log from the node:

      EC2 (amazon-ec2) - windows-label(i-0531ef3f07b2cdaee) booted at 1582839470000
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Waiting for password to be available. Sleeping 10s.
      Connecting to (10.93.145.196) with WinRM as david_webb6
      WinRM service responded. Waiting for WinRM service to stabilize on EC2 (amazon-ec2) - windows-label (i-0531ef3f07b2cdaee)
      WinRM should now be ok on EC2 (amazon-ec2) - windows-label (i-0531ef3f07b2cdaee)
      Connected with WinRM.
      Creating tmp directory if it does not exist
      remoting.jar sent remotely. Bootstrapping it
      Launching via WinRM:java  -jar C:\Windows\Temp\remoting.jar -workDir D:\dev\jenkins
      <===[JENKINS REMOTING CAPACITY]===>Remoting version: 3.36.1
      This is a Windows agent
      ERROR: Failed to monitor for Free Swap Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:316)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:316)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Temp Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:316)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      Agent successfully connected and online
      ERROR: Connection terminated
      hudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at...
      	at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:100)
      	at hudson.remoting.FastPipedOutputStream.close(FastPipedOutputStream.java:90)
      	at hudson.plugins.ec2.util.Closeables.closeQuietly(Closeables.java:23)
      	at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:146)
      Caused: java.io.IOException: Pipe is already closed
      	at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:154)
      	at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:138)
      	at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
      	at hudson.remoting.ChunkedOutputStream.drain(ChunkedOutputStream.java:85)
      	at hudson.remoting.ChunkedOutputStream.write(ChunkedOutputStream.java:54)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:45)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
      	at hudson.remoting.Channel.send(Channel.java:721)
      	at hudson.remoting.Channel.close(Channel.java:1436)
      	at hudson.remoting.Channel.close(Channel.java:1403)
      	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:843)
      	at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:108)
      	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:734)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      ERROR: Connection terminated
      java.io.EOFException
      	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2736)
      	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3211)
      	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
      	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
      	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
      	at hudson.remoting.Command.readFrom(Command.java:140)
      	at hudson.remoting.Command.readFrom(Command.java:126)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      Caused: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
      

      2. In the middle of a build, the node will go into a disconnected state and see the same error. This occurrence is less frequent and we're unable to consistently recreate it and it may be a different issue altogether but I wanted to at least bring it to your attention in case the information was helpful in solving our problem. 

       

        1. example-ssh-userdata.ps1
          1 kB
          Joseph LaFreniere
        2. Screen Shot 2020-03-03 at 1.14.50 PM.png
          176 kB
          Pierson Yieh
        3. Screen Shot 2020-03-03 at 1.15.48 PM.png
          350 kB
          Pierson Yieh

            thoulen FABRIZIO MANFREDI
            pyieh Pierson Yieh
            Votes:
            8 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: