Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42937

Massive slowdown of pipeline execution when making massive use of parallel

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Duplicate
    • Component/s: pipeline
    • Labels:
      None
    • Environment:
      Jenkins 2.50 on Windows (master)
    • Similar Issues:

      Description

      I use pipeline parallel feature to schedule all regression tests we use in parallel, before the weekend we decided to change the job, to use the full set of tests (roughly 650 test suites). To ensure a good test system utilization we build up a dedicated node with four executors and we create a parallel branch per test suite. This was quite well proven with smaller sets of tests, now we observe an tremendous slowdown of pipeline execution, to a point were a simple informative echo takes several minutes to be processed according to the job log.

      Looking at related systems utilization, it turns out that the master runs cpu bound on one core of four (Java all the time 25% or more). Using Jenkins monitor plugin we found all the time is consumed by WorkflowRun.copyLogs in LinearBlockHoppingScanner.java:120-123, see stack trace provided.

      org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:120)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255)
      org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274)
      org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153)
      org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128)
      org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441)
      org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125)
      org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313)
      java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      java.util.concurrent.FutureTask.runAndReset(Unknown Source)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
      java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      java.lang.Thread.run(Unknown Source)

      Looking at source code it seems to loop over a list in a next method, calling on each iteration .contains(...) on another collection. Depending on underlying collections this could introduce a huge runtime complexity.

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment - - edited

            It's a bit more complex than that, because it actually has to do with the graph heads structure vs. the iteration behavior.  Essentially we can walk back to the beginning of the directed-acyclic flow graph multiple times for parallels.  This gives ~O(m*n) performance for each BlockStartNode checked, where m is the number of nodes previously run, and n is the number of branches, assuming each branch is an equal number of nodes long.  

            This version removes that behavior, which should greatly improve performance with many parallels.  In theory it is still far from optimal, because we could simply store the hierarchy of inclosing blocks in memory while running a flow, reducing an O(m) scan to O(d) where d is depth (linear check) or O(1) if a hashset is used to store enclosing blocks.  However in practice any of that would require some very significant internal changes, and copyLogs is slated to disappear in the future.

            Show
            svanoort Sam Van Oort added a comment - - edited It's a bit more complex than that, because it actually has to do with the graph heads structure vs. the iteration behavior.  Essentially we can walk back to the beginning of the directed-acyclic flow graph multiple times for parallels.  This gives ~O(m*n) performance for each BlockStartNode checked, where m is the number of nodes previously run, and n is the number of branches, assuming each branch is an equal number of nodes long.   This version removes that behavior, which should greatly improve performance with many parallels.  In theory it is still far from optimal, because we could simply store the hierarchy of inclosing blocks in memory while running a flow, reducing an O(m) scan to O(d) where d is depth (linear check) or O(1) if a hashset is used to store enclosing blocks.  However in practice any of that would require some very significant internal changes, and copyLogs is slated to disappear in the future.

              People

              • Assignee:
                svanoort Sam Van Oort
                Reporter:
                manschwetus Florian Manschwetus
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: