Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37575

Delays in FileMonitoringTask.WriteLog can cause process output to be resent indefinitely

    Details

    • Similar Issues:
    • Released As:
      durable-task 1.23

      Description

      Problem

      Sometimes, after a certain amount of builds, the build never ends on the node.
      It sends forever the same 20-30 lines.
      It seems that the problem occurs more often when I restart Jenkins while there are some tasks running. But I'm not sure.
      It also seems that the problem is linked to a high load/network on the master/node.
      We can see the difference between the timestamper date (put when received by the master) and the log date (written during the Powershell execution)

      Some logs files can be bigger than 10GB before I kill the process.
      (Yes, it's really stored in the JENKINS_HOME)

      I think it's worth mentionning that I have 3 jenkins nodes on the same machine, and that my JENKINS-HOME is located on a network drive (CIFS/SMB).

      Investigation

      Steps

      I've found on Wireshark that the node keeps sending the same logs forever.
      So the jenkins master is not (directly) the culprit.
      After enabling the debugguer on the slave, I've found that the method FileMonitoringTask$FileMonitoringController$WriteLog.invoke is called in an infinite loop somewhere in this file:
      durable-task-plugin\src\main\java\org\jenkinsci\plugins\durabletask\FileMonitoringTask.java
      The same file is read again and again with a lastLocation of 1930670. lastLocation represent the bytes already read. But I don't understand why it doesn't increase.
      The process is terminated, the log file is not bigger than 3MB (and can be seen in the upper left corner of the screenshot)

      Update 1: It seems that Jenkins read the whole file. If it fails, it will return 0. I suspect that Jenkins is failing to close the file descriptor. So the lastLocation is not updated. But the data are sent. Jenkins retries to read the file, fail again, etc. That's only a supposition for now.

      Update 2: It seems that it comes from the network, as I've captured a java.io.InterruptedIOException in this loop in hudson.remoting.ProxyOutputStream.

      Update 3: It seems that the Jenkins Master is guilty. I've connected my debugger to this one. The error occurs when it tries to write the log in its JENKINS_HOME. When executing the green line on the following screenshot.

      The error is catched in DurableTaskStep$Execution.check, as it seems to be a workspace error. It seems that Jenkins doesn't find the workspace folder, as it's searching the jenkins node workspace in its local file system. C:\\ci\\int12
      ocoint... So it saves the log but interrupt the task, send to the slave that it has interrupted the task. The slave thinks that the logs has not been saved, and resend it to the master, which don't find the node workspace in it's local filesystem, etc.

      Update 4: It seems that when the master deserializes the response of the node, it returns a null object instead of the logs when it comes to Write Logs...

      Update 5: I've modified the code of durable-task-plugin to have more logs in the console

      // org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke
      @Override public Long invoke(File f, VirtualChannel channel) throws IOException, InterruptedException {
                      try {
                          long len = f.length();
                          if (len > lastLocation) {
                              RandomAccessFile raf = new RandomAccessFile(f, "r");
                              try {
                                  raf.seek(lastLocation);
                                  long toRead = len - lastLocation;
                                  if (toRead > Integer.MAX_VALUE) { // >2Gb of output at once is unlikely
                                      throw new IOException("large reads not yet implemented");
                                  }
                                  // TODO is this efficient for large amounts of output? Would it be better to stream data, or return a byte[] from the callable?
                                  byte[] buf = new byte[(int) toRead];
                                  raf.readFully(buf);
                                  sink.write(buf);
                              } finally {
                                  raf.close();
                              }
                              LOGGER.log(Level.SEVERE, "QDU WILL RETURN AS NEW CURSOR POSITION {0}", len);
                              return len;
                          } else {
                              LOGGER.log(Level.SEVERE, "QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED");
                              return null;
                          }
                      } catch(IOException e) {
                          LOGGER.log(Level.SEVERE, "QDU IOEXCEPTION {0}", e);
                          throw e;
                      } catch(Exception e) {
                          LOGGER.log(Level.SEVERE, "QDU UNKNOWN EXCEPTION {0}", e);
                      }
                      return null;
                  }
      

      On the Jenkins Node, I've the following logs:

      Aug 22, 2016 2:56:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:56:41 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:56:52 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:56:54 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:02 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:09 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:12 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:22 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:24 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:33 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:39 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU WILL RETURN NULL AND WILL HAVE TO BE REUPLOADED
      Aug 22, 2016 2:57:43 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      
      Aug 22, 2016 2:57:53 AM org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog invoke
      SEVERE: QDU IOEXCEPTION {0}
      java.io.InterruptedIOException
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:147)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at hudson.remoting.RemoteOutputStream.write(RemoteOutputStream.java:106)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:137)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$WriteLog.invoke(FileMonitoringTask.java:116)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:153)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
      	at hudson.remoting.Request$2.run(Request.java:332)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at hudson.remoting.Engine$1$1.run(Engine.java:85)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:209)
      	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:122)
      	... 14 more
      

      How to reproduce

      I've tried to reproduce this bug for 3 days but didn't achieve to.
      I had to put my production in debug to inspect the error.
      As I don't understand why it fails, I can't create a reproduce protocol yet.
      I hope that someone google this error with the same keywords and post a comment here to mention that he has the same error and more information

        Attachments

          Issue Links

            Activity

            Hide
            lne Lutz Neugebauer added a comment -

            Hello,

            I’m experiencing this issue, too.

            My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again.
            If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce.

            I observed the following exception in log approximately every 10 seconds when the problem occurred:
            Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check

            could not check [workspace path here]
            java.lang.InterruptedException
            at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
            at java.util.concurrent.FutureTask.get(FutureTask.java:187)
            at hudson.remoting.Request.call(Request.java:158)
            at hudson.remoting.Channel.call(Channel.java:780)
            at hudson.FilePath.act(FilePath.java:1007)
            at hudson.FilePath.act(FilePath.java:996)
            at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132)
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264)
            at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

            Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28.

            Regards,
            Lutz

            Show
            lne Lutz Neugebauer added a comment - Hello, I’m experiencing this issue, too. My suspicion is that this is caused by the way org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep currently works. In its check() method the call to controller.writeLog(…) is effected by the Timer defined in that method. As far as I understand the Timer interrupts the execution after 10 seconds. If writeLog() has not completed within those 10 seconds it is interrupted. Within those 10 seconds it has already written a certain amount of the log, but at the next attempt it will restart again. If the log on the node has meanwhile grown and machine load or network load conditions don’t allow to complete the writeLog() within 10 seconds this will loop indefinitely until disk space is filled up. The dependency on those conditions makes it that hard to reproduce. I observed the following exception in log approximately every 10 seconds when the problem occurred: Sep 14, 2016 10:26:49 AM FINE org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution check could not check [workspace path here] java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at hudson.remoting.Request.call(Request.java:158) at hudson.remoting.Channel.call(Channel.java:780) at hudson.FilePath.act(FilePath.java:1007) at hudson.FilePath.act(FilePath.java:996) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:132) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:264) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:233) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Note: The line numbers from classes of the durable-task plugin may vary compared to official versions since I have changes in there relating to https://github.com/jenkinsci/durable-task-plugin/pull/28 . Regards, Lutz
            Hide
            jglick Jesse Glick added a comment -

            Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.

            Show
            jglick Jesse Glick added a comment - Unless it is reproducible from scratch I do not expect to spend time on it since I hope to rewrite this code as part of JENKINS-38381 anyway.
            Hide
            lne Lutz Neugebauer added a comment -

            Reproducing a race condition reliably is usually not that easy. However, it can be done here with a minimally invasive surgery. By adding a 11 seconds sleep right before the return of FileMonitoringTask’s WriteLog.invoke() method the issue can be triggered:

            diff --git a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
            index 163ca83..d55134c 100644
            --- a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
            +++ b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
            @@ -137,6 +137,7 @@ public abstract class FileMonitoringTask extends DurableTask {
                                 } finally {
                                     raf.close();
                                 }
            +                    Thread.sleep(11*1000);
                                 return len;
                             } else {
                                 return null;
            

            For reproducing you may use the following pipeline script with your node:

            node('yourNodeHere') {
                sh "echo 'This echo infinitely shows up in the log every 10 seconds.'"
            }
            
            Show
            lne Lutz Neugebauer added a comment - Reproducing a race condition reliably is usually not that easy. However, it can be done here with a minimally invasive surgery. By adding a 11 seconds sleep right before the return of FileMonitoringTask’s WriteLog.invoke() method the issue can be triggered: diff --git a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java index 163ca83..d55134c 100644 --- a/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java +++ b/src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java @@ -137,6 +137,7 @@ public abstract class FileMonitoringTask extends DurableTask { } finally { raf.close(); } + Thread.sleep(11*1000); return len; } else { return null; For reproducing you may use the following pipeline script with your node: node('yourNodeHere') { sh "echo 'This echo infinitely shows up in the log every 10 seconds.'" }
            Hide
            jglick Jesse Glick added a comment -

            JENKINS-38381 does use a different code path.

            Show
            jglick Jesse Glick added a comment - JENKINS-38381 does use a different code path.
            Hide
            ssbarnea Sorin Sbarnea added a comment -

            Is there any workaround that we can use for avoiding this bug?

            Show
            ssbarnea Sorin Sbarnea added a comment - Is there any workaround that we can use for avoiding this bug?
            Hide
            philgrayson Phil Grayson added a comment - - edited

            We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away.

            https://jenkins.io/blog/2016/11/21/gc-tuning/ has details.

            It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1. 

            Show
            philgrayson Phil Grayson added a comment - - edited We experienced this issue but after changing the Jenkins master to use G1 garbage collector, the issue went away. https://jenkins.io/blog/2016/11/21/gc-tuning/  has details. It might be a coincidence and it wouldn't have solved the root cause of the issue. However, our master has never been more stable after switching to G1. 
            Hide
            jzila John Zila added a comment -

            Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.

            Show
            jzila John Zila added a comment - Any progress on this? We just started seeing this again yesterday, with multi-GB logs and interminable builds.
            Hide
            shahmishal mishal shah added a comment -

            We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018. 

             

             

            Show
            shahmishal mishal shah added a comment - We are also seeing this with Jenkins ver. 2.121.1, what is the current status of this issue? It has been under review since Feb 9th 2018.     
            Hide
            jzila John Zila added a comment -

            We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we `archiveArtifacts` those log files.

            Show
            jzila John Zila added a comment - We ended up refactoring our Jenkinsfiles so that any potentially offending steps log to a file instead. Then we `archiveArtifacts` those log files.
            Hide
            shahmishal mishal shah added a comment -

            Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.  

            Show
            shahmishal mishal shah added a comment - Thats great workaround, I wish we can resolve this issue soon because without displaying the live log it makes it hard for everyone to track the progress of the build. Also, this will break plugins which analyze the console output for failures.  
            Show
            shahmishal mishal shah added a comment - John Zila Are you using https://github.com/ssbarnea/rhos-ci/blob/master/vars/sh2.groovy  ?
            Hide
            jzila John Zila added a comment -

            Nope, although that does look pretty nice.

            Show
            jzila John Zila added a comment - Nope, although that does look pretty nice.
            Hide
            jglick Jesse Glick added a comment -

            Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.

            Show
            jglick Jesse Glick added a comment - Filed a new PR 74 to address this specific bug, even though planned log handling changes would replace the entire code path anyway.
            Hide
            jglick Jesse Glick added a comment -

            PR merged.

            Show
            jglick Jesse Glick added a comment - PR merged.
            Hide
            svanoort Sam Van Oort added a comment -

            Released as durable-task 2.23

            Show
            svanoort Sam Van Oort added a comment - Released as durable-task 2.23
            Hide
            kroutley Kurt Routley added a comment -

            Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23? https://plugins.jenkins.io/durable-task

            Show
            kroutley Kurt Routley added a comment - Sam Van Oort Just want to clarify from comment, fix was made in 1.23 and not 2.23?  https://plugins.jenkins.io/durable-task
            Hide
            svanoort Sam Van Oort added a comment -

            Kurt Routley Yes, 1.23, sorry, typo

            Show
            svanoort Sam Van Oort added a comment - Kurt Routley Yes, 1.23, sorry, typo

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                superboum Quentin Dufour
              • Votes:
                13 Vote for this issue
                Watchers:
                27 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: