Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54513

Builds occupying executor but not running

    Details

    • Similar Issues:

      Description

      We have seen a couple of jobs which appear to have got executors stuck while starting the build.

      For example one buildhas console output (build #428)

      Started by upstream project "<folder>/<job1>" build number 181
      originally caused by:
      {{ Started by upstream project "<folder>/<job2>" build number 852}}
      {{ originally caused by:}}
      {{ Started by upstream project "<folder>/<job3>" build number 1569}}
      {{ originally caused by:}}
      {{ Started by user <user>}}
      {{ Replayed #1568}}
      Resume disabled by user, switching to high-performance, low-durability mode.
      [Pipeline] node
      Still waiting to schedule task
      Aborted by <user>
      Click here to forcibly terminate running steps
      Click here to forcibly kill entire build
      Hard kill!
      Finished: ABORTED

       

      And according the build history of that job it has been aborted. However, in the Build Executor Status an executor has been allocated and it believe that it is still running. Trying to abort from Build Executor Status doesn't do anything.

      This happened across 4 jobs which were run in parallel, triggered by another job and were started within 2 seconds of each other.

      I could not see any blocked threads on the node (and we have tried restarting the Jenkins agent service) to no avail.

      There are some blocked threads on the master, see threadDump.txt. It looks like this is some kind of deadlock condition with multiple jobs being triggered in a short space of time. (Can provide a fuller thread dump if useful but would take time to obfuscate).

      Note that it does not look like it was killing the job that caused it, it looks like it was on build startup, before or when it was being allocated a node, and killing the job didn't help.

        Attachments

        1. BuildExecutors.PNG
          BuildExecutors.PNG
          5 kB
        2. BuildHistory.PNG
          BuildHistory.PNG
          22 kB
        3. threadDump.txt
          11 kB

          Activity

          Hide
          rg Russell Gallop added a comment - - edited

          Just seen what seems to be this same issue again. On this occasion it was all one job running parallel on a couple of Linux machines. In this case the job was reporting:

          [TASK1] Still waiting to schedule task
          [TASK2] Still waiting to schedule task
          [TASK3] Still waiting to schedule task
          [TASK4] Still waiting to schedule task 

          There were only 3 executors available for this and according to the Build Executor Status they have all been assigned to "<folder> » <job> #476 (part)".

          In this case there were blocked threads in PrintStream.java:805 but not the ones I saw before in PrintStream.java:355 (threads on master):

           Executor #0 for <agent1> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[56:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}
          "Executor #0 for <agent1> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[56:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}" Id=5916973 Group=main BLOCKED on java.io.PrintStream@208dc6f1 owned by "jenkins.util.Timer [#9]" Id=80
          	at java.io.PrintStream.println(PrintStream.java:805)
          	-  blocked on java.io.PrintStream@208dc6f1
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:429)
          
          Executor #0 for <agent2> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[62:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}
          "Executor #0 for <agent2> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[62:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}" Id=5917131 Group=main BLOCKED on java.io.PrintStream@107ec0b6 owned by "jenkins.util.Timer [#10]" Id=81
          	at java.io.PrintStream.println(PrintStream.java:805)
          	-  blocked on java.io.PrintStream@107ec0b6
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:429)
          
          Executor #0 for <agent3> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[59:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}
          "Executor #0 for <agent3> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[59:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie=null,auth=null}" Id=5917098 Group=main BLOCKED on java.io.PrintStream@7e577653 owned by "jenkins.util.Timer [#2]" Id=65
          	at java.io.PrintStream.println(PrintStream.java:805)
          	-  blocked on java.io.PrintStream@7e577653
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694)
          	at hudson.model.ResourceController.execute(ResourceController.java:97)
          	at hudson.model.Executor.run(Executor.java:429)
          

          As before there are no processes being run by the Jenkins agent.

          The Timer threads look like:

          jenkins.util.Timer [#10]jenkins.util.Timer [#10]"jenkins.util.Timer [#10]" Id=81 Group=main RUNNABLE (in native) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@107ec0b6 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@1f820291 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@107ec0b6 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@107ec0b6 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
           Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@64a11e8a
          
          jenkins.util.Timer [#2]"jenkins.util.Timer [#2]" Id=65 Group=main RUNNABLE (in native) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@7e577653 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@6a5bc51 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@7e577653 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@7e577653 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
           Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@32819625
          
          jenkins.util.Timer [#9]"jenkins.util.Timer [#9]" Id=80 Group=main RUNNABLE (in native) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@208dc6f1 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@9b4ca4c at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@208dc6f1 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@208dc6f1 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
           Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@718a3c9d 

          This appears to be trying to write something to disk. Can someone tell what and where this would be trying to write?

          Show
          rg Russell Gallop added a comment - - edited Just seen what seems to be this same issue again. On this occasion it was all one job running parallel on a couple of Linux machines. In this case the job was reporting: [TASK1] Still waiting to schedule task [TASK2] Still waiting to schedule task [TASK3] Still waiting to schedule task [TASK4] Still waiting to schedule task There were only 3 executors available for this and according to the Build Executor Status they have all been assigned to "<folder> » <job> #476 (part)". In this case there were blocked threads in PrintStream.java:805 but not the ones I saw before in PrintStream.java:355 (threads on master): Executor #0 for <agent1> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[56:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null } "Executor #0 for <agent1> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[56:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null }" Id=5916973 Group=main BLOCKED on java.io.PrintStream@208dc6f1 owned by "jenkins.util.Timer [#9]" Id=80 at java.io.PrintStream.println(PrintStream.java:805) - blocked on java.io.PrintStream@208dc6f1 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429) Executor #0 for <agent2> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[62:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null } "Executor #0 for <agent2> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[62:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null }" Id=5917131 Group=main BLOCKED on java.io.PrintStream@107ec0b6 owned by "jenkins.util.Timer [#10]" Id=81 at java.io.PrintStream.println(PrintStream.java:805) - blocked on java.io.PrintStream@107ec0b6 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429) Executor #0 for <agent3> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[59:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null } "Executor #0 for <agent3> : executing PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=<folder>/<job>#476,label=<label>,context=CpsStepContext[59:node]:Owner[<folder>/<job>/476:<folder>/<job> #476],cookie= null ,auth= null }" Id=5917098 Group=main BLOCKED on java.io.PrintStream@7e577653 owned by "jenkins.util.Timer [#2]" Id=65 at java.io.PrintStream.println(PrintStream.java:805) - blocked on java.io.PrintStream@7e577653 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:694) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429) As before there are no processes being run by the Jenkins agent. The Timer threads look like: jenkins.util.Timer [#10]jenkins.util.Timer [#10] "jenkins.util.Timer [#10]" Id=81 Group=main RUNNABLE (in native ) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@107ec0b6 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@1f820291 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@107ec0b6 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@107ec0b6 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@64a11e8a jenkins.util.Timer [#2] "jenkins.util.Timer [#2]" Id=65 Group=main RUNNABLE (in native ) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@7e577653 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@6a5bc51 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@7e577653 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@7e577653 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@32819625 jenkins.util.Timer [#9] "jenkins.util.Timer [#9]" Id=80 Group=main RUNNABLE (in native ) at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.PrintStream.write(PrintStream.java:480) -  locked java.io.PrintStream@208dc6f1 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) -  locked java.io.OutputStreamWriter@9b4ca4c at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) -  locked java.io.PrintStream@208dc6f1 at java.io.PrintStream.println(PrintStream.java:807) -  locked java.io.PrintStream@208dc6f1 at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$1.run(ExecutorStepExecution.java:117) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@718a3c9d This appears to be trying to write something to disk. Can someone tell what and where this would be trying to write?
          Hide
          bsnaresh Naresh BS added a comment -

          We are running into similar issues with our Jenkins setup on version 2.121.1.

          Has anyone been able to figure out the root cause or how to fix this? Has this been fixed in any version of Jenkins yet.

          Show
          bsnaresh Naresh BS added a comment - We are running into similar issues with our Jenkins setup on version 2.121.1. Has anyone been able to figure out the root cause or how to fix this? Has this been fixed in any version of Jenkins yet.

            People

            • Assignee:
              Unassigned
              Reporter:
              rg Russell Gallop
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: