Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33624

Improve performance of Pipeline Stage View even with very long FlowGraphs

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      The algorithms in pipeline stage view cover the normal use case well, but have issues with excessively large FlowGraphs. This is most commonly caused when someone runs a loop (with steps and/or blocks embedded in it) that executes many times.

      To reproduce, create and run a build with the following pipeline code and then view in stageview:

      for(int i=0; i<9999; i++) {
         echo "Cycle ${i}"
      }
      

      The result is combination of high master CPU load, very long or incomplete UI requests, very large response datasets (MB), and/or high memory use.

        Attachments

          Activity

          Hide
          svanoort Sam Van Oort added a comment -

          Released and verified as working by users.

          Show
          svanoort Sam Van Oort added a comment - Released and verified as working by users.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Sam Van Oort
          Path:
          rest-api/pom.xml
          rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeExecutorNameCache.java
          rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeListCacheAction.java
          rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeNavigationListener.java
          rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeUtil.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/AbstractAPIActionHandler.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/FlowNodeAPI.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/RunAPI.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/flownode/Describe.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/external/AtomFlowNodeExt.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/external/FlowNodeExt.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/external/JobExt.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/external/RunExt.java
          rest-api/src/main/java/com/cloudbees/workflow/rest/external/StageNodeExt.java
          rest-api/src/main/java/com/cloudbees/workflow/util/ModelUtil.java
          rest-api/src/test/java/com/cloudbees/workflow/flownode/FlowNodeUtilTest.java
          rest-api/src/test/java/com/cloudbees/workflow/rest/endpoints/JobAndRunAPITest.java
          ui/pom.xml
          http://jenkins-ci.org/commit/pipeline-stage-view-plugin/488fe8b0170f9affc49a8e4169f74e34a21a97c4
          Log:
          Merge pull request #4 from jenkinsci/optimize-flow-walking

          JENKINS-33624 Improve Perfomance with Large Pipelines

          Compare: https://github.com/jenkinsci/pipeline-stage-view-plugin/compare/360909b3b6b9...488fe8b0170f

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: rest-api/pom.xml rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeExecutorNameCache.java rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeListCacheAction.java rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeNavigationListener.java rest-api/src/main/java/com/cloudbees/workflow/flownode/FlowNodeUtil.java rest-api/src/main/java/com/cloudbees/workflow/rest/AbstractAPIActionHandler.java rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/FlowNodeAPI.java rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/RunAPI.java rest-api/src/main/java/com/cloudbees/workflow/rest/endpoints/flownode/Describe.java rest-api/src/main/java/com/cloudbees/workflow/rest/external/AtomFlowNodeExt.java rest-api/src/main/java/com/cloudbees/workflow/rest/external/FlowNodeExt.java rest-api/src/main/java/com/cloudbees/workflow/rest/external/JobExt.java rest-api/src/main/java/com/cloudbees/workflow/rest/external/RunExt.java rest-api/src/main/java/com/cloudbees/workflow/rest/external/StageNodeExt.java rest-api/src/main/java/com/cloudbees/workflow/util/ModelUtil.java rest-api/src/test/java/com/cloudbees/workflow/flownode/FlowNodeUtilTest.java rest-api/src/test/java/com/cloudbees/workflow/rest/endpoints/JobAndRunAPITest.java ui/pom.xml http://jenkins-ci.org/commit/pipeline-stage-view-plugin/488fe8b0170f9affc49a8e4169f74e34a21a97c4 Log: Merge pull request #4 from jenkinsci/optimize-flow-walking JENKINS-33624 Improve Perfomance with Large Pipelines Compare: https://github.com/jenkinsci/pipeline-stage-view-plugin/compare/360909b3b6b9...488fe8b0170f
          Hide
          svanoort Sam Van Oort added a comment - - edited

          Benchmark results using the following optimizations, but not the optimized flow scanner (separate PR until refactors and edge cases are solved):

          • Cache the completely analyzed RunExt response object for each run, which includes fully realized Stage nodes
          • Use standardized caching implementation
          • Use non-recursive method calls for all flow walking, to avoid StackOverflow issues
          • Cap number of returned nodes in a stage
          • Avoid materializing unneeded API return objects
          • When returning RunExt objects, use a wrapper to hide the child nodes in each stage until details of the stage are requested via the Describe API

          Benchmark:

          • Old 1.0 plugin:
          • Initial runs data: 1220 ms, then
          • Getting stage descriptions (describe API): 780 + 43 +792 + 40 = 1655 ms
          • Total on first load: 2875s
          • On page refresh (hitting caches): runs = 12ms, description of stages = 669ms, 10ms, 680ms, 18ms = 1389 ms total (every load)
          • New plugin (snapshot):
          • Initial runs data: 1.08s, then
          • Getting stage descriptions (describe API): 65 s + 54 ms + 64 ms + 52 ms = total 235 ms for stages
          • Total on first load: 1315 ms
          • On page refresh (hitting caches): 3ms to get runs, descriptions of stages = 12ms + 10ms + 4 ms + 10ms = 39 ms total

          Speedup:

          • First load is roughly 2x as fast, 35x as fast on refreshes. This scales even better as the number of stages and nodes increases.
          • No stack overflows on large stages.
          • Memory is bounded by cache sizes

          Benchmark job:

          stage 'long stage'
          for(int i=0; i<999; i++) {
            echo "Output $i"
          }
          stage 'short stage'
          echo 'finished second stage'
          

          Run this 2x to generate runs data. On larger flow graphs (10k+ nodes), or with numerous complex stages, speedups from caching the stage data along with the run (and only analyzing a run a single time) are 100x or more.

          Unfortunately I can't benchmark that because it hits the stack overflow.

          Show
          svanoort Sam Van Oort added a comment - - edited Benchmark results using the following optimizations, but not the optimized flow scanner (separate PR until refactors and edge cases are solved): Cache the completely analyzed RunExt response object for each run, which includes fully realized Stage nodes Use standardized caching implementation Use non-recursive method calls for all flow walking, to avoid StackOverflow issues Cap number of returned nodes in a stage Avoid materializing unneeded API return objects When returning RunExt objects, use a wrapper to hide the child nodes in each stage until details of the stage are requested via the Describe API Benchmark: Old 1.0 plugin: Initial runs data: 1220 ms, then Getting stage descriptions (describe API): 780 + 43 +792 + 40 = 1655 ms Total on first load: 2875s On page refresh (hitting caches): runs = 12ms, description of stages = 669ms, 10ms, 680ms, 18ms = 1389 ms total (every load) New plugin (snapshot): Initial runs data: 1.08s, then Getting stage descriptions (describe API): 65 s + 54 ms + 64 ms + 52 ms = total 235 ms for stages Total on first load: 1315 ms On page refresh (hitting caches): 3ms to get runs, descriptions of stages = 12ms + 10ms + 4 ms + 10ms = 39 ms total Speedup: First load is roughly 2x as fast, 35x as fast on refreshes. This scales even better as the number of stages and nodes increases. No stack overflows on large stages. Memory is bounded by cache sizes Benchmark job: stage ' long stage' for ( int i=0; i<999; i++) { echo "Output $i" } stage ' short stage' echo 'finished second stage' Run this 2x to generate runs data. On larger flow graphs (10k+ nodes), or with numerous complex stages, speedups from caching the stage data along with the run (and only analyzing a run a single time) are 100x or more. Unfortunately I can't benchmark that because it hits the stack overflow.
          Hide
          svanoort Sam Van Oort added a comment -

          Current approaches being used:

          • Caching of post-analysis stage view results
          • Revise existing caching to use standard caching library (Guava or possibly Caffeine) rather than ad-hoc Actions with transient cache fields, and a WeakHashMap
          • Use more efficient algorithm to scan the FlowGraph rapidly for key stats and information
          • Cap number of returned child nodes in a stage
          • Avoid materializing API return objects unless needed
          Show
          svanoort Sam Van Oort added a comment - Current approaches being used: Caching of post-analysis stage view results Revise existing caching to use standard caching library (Guava or possibly Caffeine) rather than ad-hoc Actions with transient cache fields, and a WeakHashMap Use more efficient algorithm to scan the FlowGraph rapidly for key stats and information Cap number of returned child nodes in a stage Avoid materializing API return objects unless needed
          Hide
          svanoort Sam Van Oort added a comment -
          Show
          svanoort Sam Van Oort added a comment - WIP PR in progress here: https://github.com/jenkinsci/pipeline-stage-view-plugin/pull/4

            People

            • Assignee:
              svanoort Sam Van Oort
              Reporter:
              svanoort Sam Van Oort
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: