# Delays in FileMonitoringTask.WriteLog can cause process output to be resent indefinitely

#### Details

• Type: Bug
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Component/s:
• Labels:
• Environment:
Windows Server 2012 R2
Jenkins 2.7.2
• Similar Issues:
• Released As:

# Problem

Sometimes, after a certain amount of builds, the build never ends on the node.
It sends forever the same 20-30 lines.
It seems that the problem occurs more often when I restart Jenkins while there are some tasks running. But I'm not sure.
It also seems that the problem is linked to a high load/network on the master/node.
We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution)

Some logs files can be bigger than 10GB before I kill the process.
(Yes, it's really stored in the JENKINS_HOME)

I think it's worth mentionning that I have 3 jenkins nodes on the same machine, and that my JENKINS-HOME is located on a network drive (CIFS/SMB).

# Investigation

## Steps

I've found on Wireshark that the node keeps sending the same logs forever.
So the jenkins master is not (directly) the culprit.
After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file:
The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase.
The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot)

Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now.

Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream.

Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.

The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12 ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. Update 4: It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... Update 5: I've modified the code of durable-task-plugin to have more logs in the console // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { try { long len = f.length(); if (len > lastLocation) { RandomAccessFile raf = new RandomAccessFile(f, "r"); try { raf.seek(lastLocation); long toRead = len - lastLocation; if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely throw new IOException("large reads not yet implemented"); } // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable? byte[] buf = new byte[(int) toRead]; raf.readFully(buf); sink.write(buf); } finally { raf.close(); } LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len); return len; } else { LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED"); return null; } } catch(IOException e) { LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e); throw e; } catch(Exception e) { LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e); } return null; }  On the Jenkins Node, I've the following logs: Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more  # How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information #### Attachments #### Attachments #### Issue Links #### Activity Quentin Dufour created issue - Quentin Dufour made changes - Field Original Value New Value Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. (check the attached screenshots above) h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Attachment Capture d'écran de 2016-08-20 17-04-34.png [ 33674 ] Attachment Capture d'écran de 2016-08-20 16-35-18.png [ 33675 ] Quentin Dufour made changes -  Attachment jekins_10GB_log.png [ 33676 ] Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. (check the attached screenshots above) h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. h2. Screenshots On h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. h2. Screenshots On h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Attachment Capture d'écran de 2016-08-20 19-02-41.png [ 33677 ] Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png! h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. Some logs files can be bigger than 10GB before I kill the process. h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png! h2. Screenshots On the first one we can see the logs. Especially that the date from timestamper is different from the date of the logs. It's a good indication that we have a problem. On the second screenshot, I've connected netbeans on the node which handle the job. I've stopped the thread and also put a breakpoint on sink.write(buf). When I hit continue, I've seen that this function is always called. We can see some variables, especially the file descriptor and its associated size. The folder of this file is open on the top left of this screenshot. We can see that the size of this file is arround 3MB. On the last screenshot, it's just an example at how big the file becomes before crashing Jenkins in an Out of Memory Exception... h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !jenkins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !jenkins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Attachment Capture d'écran de 2016-08-20 19-35-36.png [ 33678 ] Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Edit 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Edit 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. Quentin Dufour made changes -  Attachment Capture d'écran de 2016-08-20 19-35-13.png [ 33679 ] Quentin Dufour made changes -  Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3: *It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint...  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to.
 Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3: *It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to.
 Description h1. Problem The build never ends on the node. It sends forever the same 20-30 lines. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Attachment Capture d'écran de 2016-08-20 23-41-53.png [ 33680 ]
 Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {{SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {{SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {{ SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more }} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {{ SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more }} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat}   @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning but I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat}   @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat}   @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat}   @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) Quentin Dufour made changes -  Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 9:04:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException}} at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} {noformat} Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { try { long len = f.length(); if (len > lastLocation) { RandomAccessFile raf = new RandomAccessFile(f, "r"); try { raf.seek(lastLocation); long toRead = len - lastLocation; if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely throw new IOException("large reads not yet implemented"); } // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable? byte[] buf = new byte[(int) toRead]; raf.readFully(buf); sink.write(buf); } finally { raf.close(); } LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len); return len; } else { LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED"); return null; } } catch(IOException e) { LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e); throw e; } catch(Exception e) { LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e); } return null; } {noformat} {noformat} Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} On the Jenkins Node, I've the following logs: {noformat} Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Description h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occures more when I restart Jenkins while there are some tasks running. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.  !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { try { long len = f.length(); if (len > lastLocation) { RandomAccessFile raf = new RandomAccessFile(f, "r"); try { raf.seek(lastLocation); long toRead = len - lastLocation; if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely throw new IOException("large reads not yet implemented"); } // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable? byte[] buf = new byte[(int) toRead]; raf.readFully(buf); sink.write(buf); } finally { raf.close(); } LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len); return len; } else { LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED"); return null; } } catch(IOException e) { LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e); throw e; } catch(Exception e) { LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e); } return null; } {noformat} On the Jenkins Node, I've the following logs: {noformat} Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :) h1. Problem Sometimes, after a certain amount of builds, the build never ends on the node. It sends forever the same 20-30 lines. It seems that the problem occurs more often when I restart Jenkins while there are some tasks running. But I'm not sure. It also seems that the problem is linked to a high load/network on the master/node. We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution) !Capture d'écran de 2016-08-20 16-35-18.png|thumbnail! Some logs files can be bigger than 10GB before I kill the process. (Yes, it's really stored in the JENKINS_HOME) !jekins_10GB_log.png|thumbnail! _I think it's worth mentionning that I have 3 jenkins nodes on the same machine, and that my JENKINS-HOME is located on a network drive (CIFS/SMB)._ h1. Investigation h2. Steps I've found on Wireshark that the node keeps sending the same logs forever. So the jenkins master is not (directly) the culprit. After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file: durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase. The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot) !Capture d'écran de 2016-08-20 17-04-34.png|thumbnail! *Update 1:* It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now. *Update 2:* It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream. !Capture d'écran de 2016-08-20 19-02-41.png|thumbnail! *Update 3:* It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot. !Capture d'écran de 2016-08-20 19-35-36.png|thumbnail! The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12\\ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.  !Capture d'écran de 2016-08-20 19-35-13.png|thumbnail! *Update 4:* It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...  !Capture d'écran de 2016-08-20 23-41-53.png|thumbnail! *Update 5:* I've modified the code of durable-task-plugin to have more logs in the console {noformat} // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {                 try {                     long len = f.length();                     if (len > lastLocation) {                         RandomAccessFile raf = new RandomAccessFile(f, "r");                         try {                             raf.seek(lastLocation);                             long toRead = len - lastLocation;                             if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely                                 throw new IOException("large reads not yet implemented");                             }                             // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?                             byte[] buf = new byte[(int) toRead];                             raf.readFully(buf);                             sink.write(buf);                         } finally {                             raf.close();                         }                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);                         return len;                     } else {                         LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");                         return null;                     }                 } catch(IOException e) {                     LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);                     throw e;                 } catch(Exception e) {                     LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);                 }                 return null;             } {noformat} On the Jenkins Node, I've the following logs: {noformat} Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more {noformat} h1. How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information :)
 Component/s remoting [ 15489 ]
 Summary Jenkins node keeps sending the sames logs indefinitely Jenkins node keeps sending the same logs indefinitely
Hide
Lutz Neugebauer added a comment -

Hello,

I’m experiencing this issue, too.

My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again.
If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce.

I observed the following exception in log approximately every 10 seconds when the problem occurred:
Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check could not check [workspace path here] java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at hudson.remoting.Request.call(Request.java:158) at hudson.remoting.Channel.call(Channel.java:780) at hudson.FilePath.act(FilePath.java:1007) at hudson.FilePath.act(FilePath.java:996) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28. Regards, Lutz Show Lutz Neugebauer added a comment - Hello, I’m experiencing this issue, too. My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again. If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce. I observed the following exception in log approximately every 10 seconds when the problem occurred: Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check could not check [workspace path here] java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at hudson.remoting.Request.call(Request.java:158) at hudson.remoting.Channel.call(Channel.java:780) at hudson.FilePath.act(FilePath.java:1007) at hudson.FilePath.act(FilePath.java:996) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28 . Regards, Lutz
Hide
Jesse Glick added a comment -

Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.

Show
Jesse Glick added a comment - Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.
 Link This issue relates to JENKINS-38381
Hide
Lutz Neugebauer added a comment -

Reproducing a race condition reliably is usually not that easy. However, it can be done here with a minimally invasive surgery. By adding a 11 seconds sleep right before the return of FileMonitoringTask’s WriteLog.invoke() method the issue can be triggered:

diff --git a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
index 163ca83..d55134c 100644
} finally {
raf.close();
}
return len;
} else {
return null;


For reproducing you may use the following pipeline script with your node:

node('yourNodeHere') {
sh "echo 'This echo infinitely shows up in the log every 10 seconds.'"
}

Show
 Component/s remoting [ 15489 ]
 Summary Jenkins node keeps sending the same logs indefinitely Delays in WriteLog can
 Summary Delays in WriteLog can Delays in FileMonitoringTask.WriteLog can cause process output to be resent indefinitely
 Labels robustness
Hide
Jesse Glick added a comment -

JENKINS-38381 does use a different code path.

Show
Jesse Glick added a comment - JENKINS-38381 does use a different code path.
 Link This issue is duplicated by JENKINS-44483
Hide
Sorin Sbarnea added a comment -

Is there any workaround that we can use for avoiding this bug?

Show
Sorin Sbarnea added a comment - Is there any workaround that we can use for avoiding this bug?
Hide
Phil Grayson added a comment - - edited

We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away.

https://jenkins.io/blog/2016/11/21/gc-tuning/ has details.

It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1.

Show
Phil Grayson added a comment - - edited We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away. https://jenkins.io/blog/2016/11/21/gc-tuning/  has details. It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1.
 Link This issue is duplicated by JENKINS-49153
 Status Open [ 1 ] In Progress [ 3 ]
 Status In Progress [ 3 ] In Review [ 10005 ]
 Link This issue is duplicated by JENKINS-46642
Hide
John Zila added a comment -

Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.

Show
John Zila added a comment - Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.
Hide
mishal shah added a comment -

We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018.

Show
mishal shah added a comment - We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018.
Hide
John Zila added a comment -

We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we archiveArtifacts those log files.

Show
John Zila added a comment - We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we archiveArtifacts those log files.
Hide
mishal shah added a comment -

Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.

Show
mishal shah added a comment - Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.
Hide
mishal shah added a comment -
Show
mishal shah added a comment - John Zila Are you using https://github.com/ssbarnea/rhos-ci/blob/master/vars/sh2.groovy  ?
Hide
John Zila added a comment -

Nope, although that does look pretty nice.

Show
John Zila added a comment - Nope, although that does look pretty nice.
Hide
Jesse Glick added a comment -

Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.

Show
Jesse Glick added a comment - Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.
 Link This issue relates to JENKINS-38381
 Link This issue relates to JENKINS-52165
Hide
Jesse Glick added a comment -

PR merged.

Show
Jesse Glick added a comment - PR merged.
 Status In Review [ 10005 ] Resolved [ 5 ] Resolution Fixed [ 1 ]
Hide
Sam Van Oort added a comment -

Show
Sam Van Oort made changes -
 Status Resolved [ 5 ] Closed [ 6 ]
Hide
Kurt Routley added a comment -

Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23? https://plugins.jenkins.io/durable-task

Show
Kurt Routley added a comment - Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23?  https://plugins.jenkins.io/durable-task
Hide
Sam Van Oort added a comment -

Kurt Routley Yes, 1.23, sorry, typo

Show
Sam Van Oort added a comment - Kurt Routley Yes, 1.23, sorry, typo

#### People

• Assignee:
Jesse Glick
Reporter:
Quentin Dufour