# Delays in FileMonitoringTask.WriteLog can cause process output to be resent indefinitely

#### Details

• Type: Bug
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Component/s:
• Labels:
• Environment:
Windows Server 2012 R2
Jenkins 2.7.2
• Similar Issues:
• Released As:

# Problem

Sometimes, after a certain amount of builds, the build never ends on the node.
It sends forever the same 20-30 lines.
It seems that the problem occurs more often when I restart Jenkins while there are some tasks running. But I'm not sure.
It also seems that the problem is linked to a high load/network on the master/node.
We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution)

Some logs files can be bigger than 10GB before I kill the process.
(Yes, it's really stored in the JENKINS_HOME)

I think it's worth mentionning that I have 3 jenkins nodes on the same machine, and that my JENKINS-HOME is located on a network drive (CIFS/SMB).

# Investigation

## Steps

I've found on Wireshark that the node keeps sending the same logs forever.
So the jenkins master is not (directly) the culprit.
After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file:
The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase.
The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot)

Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now.

Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream.

Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.

The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12 ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc. Update 4: It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs... Update 5: I've modified the code of durable-task-plugin to have more logs in the console // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException { try { long len = f.length(); if (len > lastLocation) { RandomAccessFile raf = new RandomAccessFile(f, "r"); try { raf.seek(lastLocation); long toRead = len - lastLocation; if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely throw new IOException("large reads not yet implemented"); } // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable? byte[] buf = new byte[(int) toRead]; raf.readFully(buf); sink.write(buf); } finally { raf.close(); } LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len); return len; } else { LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED"); return null; } } catch(IOException e) { LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e); throw e; } catch(Exception e) { LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e); } return null; }  On the Jenkins Node, I've the following logs: Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke SEVERE: QDU IOEXCEPTION {0} java.io.InterruptedIOException at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147) at java.io.OutputStream.write(OutputStream.java:75) at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
... 14 more

Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
SEVERE: QDU IOEXCEPTION {0}
java.io.InterruptedIOException
at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
at java.io.OutputStream.write(OutputStream.java:75)
at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772) at hudson.remoting.UserRequest.perform(UserRequest.java:153) at hudson.remoting.UserRequest.perform(UserRequest.java:50) at hudson.remoting.Request$2.run(Request.java:332)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at hudson.remoting.Engine$1$1.run(Engine.java:85)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122) ... 14 more  # How to reproduce I've tried to reproduce this bug for 3 days but didn't achieve to. I had to put my production in debug to inspect the error. As I don't understand why it fails, I can't create a reproduce protocol yet. I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information #### Attachments #### Attachments #### Issue Links #### Activity Hide Lutz Neugebauer added a comment - Hello, I’m experiencing this issue, too. My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again. If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce. I observed the following exception in log approximately every 10 seconds when the problem occurred: Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check

could not check [workspace path here]
java.lang.InterruptedException
at hudson.remoting.Request.call(Request.java:158)
at hudson.remoting.Channel.call(Channel.java:780)
at hudson.FilePath.act(FilePath.java:1007)
at hudson.FilePath.act(FilePath.java:996)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28.

Regards,
Lutz

Show
Lutz Neugebauer added a comment - Hello, I’m experiencing this issue, too. My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again. If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce. I observed the following exception in log approximately every 10 seconds when the problem occurred: Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check could not check [workspace path here] java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at hudson.remoting.Request.call(Request.java:158) at hudson.remoting.Channel.call(Channel.java:780) at hudson.FilePath.act(FilePath.java:1007) at hudson.FilePath.act(FilePath.java:996) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor\$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28 . Regards, Lutz
Hide
Jesse Glick added a comment -

Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.

Show
Jesse Glick added a comment - Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.
Hide
Lutz Neugebauer added a comment -

Reproducing a race condition reliably is usually not that easy. However, it can be done here with a minimally invasive surgery. By adding a 11 seconds sleep right before the return of FileMonitoringTask’s WriteLog.invoke() method the issue can be triggered:

diff --git a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
index 163ca83..d55134c 100644
} finally {
raf.close();
}
return len;
} else {
return null;


For reproducing you may use the following pipeline script with your node:

node('yourNodeHere') {
sh "echo 'This echo infinitely shows up in the log every 10 seconds.'"
}

Show
Lutz Neugebauer added a comment - Reproducing a race condition reliably is usually not that easy. However, it can be done here with a minimally invasive surgery. By adding a 11 seconds sleep right before the return of FileMonitoringTask’s WriteLog.invoke() method the issue can be triggered: diff --git a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java index 163ca83..d55134c 100644 --- a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java +++ b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java @@ -137,6 +137,7 @@ public abstract class FileMonitoringTask extends DurableTask { } finally { raf.close(); } + Thread.sleep(11*1000); return len; } else { return null; For reproducing you may use the following pipeline script with your node: node('yourNodeHere') { sh "echo 'This echo infinitely shows up in the log every 10 seconds.'" }
Hide
Jesse Glick added a comment -

JENKINS-38381 does use a different code path.

Show
Jesse Glick added a comment - JENKINS-38381 does use a different code path.
Hide
Sorin Sbarnea added a comment -

Is there any workaround that we can use for avoiding this bug?

Show
Sorin Sbarnea added a comment - Is there any workaround that we can use for avoiding this bug?
Hide
Phil Grayson added a comment - - edited

We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away.

https://jenkins.io/blog/2016/11/21/gc-tuning/ has details.

It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1.

Show
Phil Grayson added a comment - - edited We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away. https://jenkins.io/blog/2016/11/21/gc-tuning/  has details. It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1.
Hide
John Zila added a comment -

Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.

Show
John Zila added a comment - Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.
Hide
mishal shah added a comment -

We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018.

Show
mishal shah added a comment - We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018.
Hide
John Zila added a comment -

We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we archiveArtifacts those log files.

Show
John Zila added a comment - We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we archiveArtifacts those log files.
Hide
mishal shah added a comment -

Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.

Show
mishal shah added a comment - Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.
Hide
mishal shah added a comment -
Show
mishal shah added a comment - John Zila Are you using https://github.com/ssbarnea/rhos-ci/blob/master/vars/sh2.groovy  ?
Hide
John Zila added a comment -

Nope, although that does look pretty nice.

Show
John Zila added a comment - Nope, although that does look pretty nice.
Hide
Jesse Glick added a comment -

Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.

Show
Jesse Glick added a comment - Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.
Hide
Jesse Glick added a comment -

PR merged.

Show
Jesse Glick added a comment - PR merged.
Hide
Sam Van Oort added a comment -

Show
Hide
Kurt Routley added a comment -

Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23? https://plugins.jenkins.io/durable-task

Show
Kurt Routley added a comment - Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23?  https://plugins.jenkins.io/durable-task
Hide
Sam Van Oort added a comment -

Kurt Routley Yes, 1.23, sorry, typo

Show
Sam Van Oort added a comment - Kurt Routley Yes, 1.23, sorry, typo

#### People

• Assignee:
Jesse Glick
Reporter:
Quentin Dufour