Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-10520

clone-workspace-scm performance is poor

    Details

    • Type: Improvement
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: clone-workspace-plugin
    • Labels:
      None
    • Environment:
      Solaris/x86, cluster of 5 nodes, all machines in cluster are identical
    • Similar Issues:

      Description

      Clone workspace performance is poor. Whilst compilation takes around 4 minutes, archiving the workspace afterwards takes the total job time out to 35 minutes! Similarly un-archiving it at the start of downstream jobs.

      We have 4 steps, Compile, quick test, package, full test.
      Compile uses Perforce SCM, we use clone workspace after that to ensure downstream builds use identical files but other compiles can continue in parallel. So we need almost the files in the entire workspace.
      The filesystem workspace size is about 1.6Gb
      The archived workspace.zip file size is about 1.4Gb

      An "exclude" filter may help a little but I think there is something slow going on.
      Note that we use slaves as well so the piped data may have an impact, but all the machines are very close together.
      When the compile runs on the master node it doesn't seem any quicker.

        Attachments

          Issue Links

            Activity

            Hide
            edrandall Ed Randall added a comment - - edited

            I did some tests, to archive a workspace from a slave to the master using "tar | ssh" takes around 2 minutes 15 seconds (2 min 10 with -z compression).

            clone-workspace plugin took 22 minutes to do it from the same workspace on the same node.

            Show
            edrandall Ed Randall added a comment - - edited I did some tests, to archive a workspace from a slave to the master using "tar | ssh" takes around 2 minutes 15 seconds (2 min 10 with -z compression). clone-workspace plugin took 22 minutes to do it from the same workspace on the same node.
            Hide
            edrandall Ed Randall added a comment - - edited

            This is our build pipeline:

            1) Compile - checks files out of Perforce, cleans, configures for the target architecture and compiles. Archives the workspace for susequent steps in the pipeline.

            2) Zip - builds the .ear file, verifies it, builds a complete shippable .zip of all project deliverables.

            3) Quick test - runs a relatively short test suite.

            2a) Zip is "promoted" if all tests in QuickTest pass. This delivers the Zip to the System test team (outside of Jenkins).

            Now we didn't want each step to sync the files down from Perforce and compile afresh, the clone-workspace plugin seemed ideal for us. Unfortunately there are a number of problems with it including:

            a) Only the latest workspace from step(1) is maintained. So if another change triggers a Compile whilst Zip is in progress, it could be that (3) Quick test actually tests that subsequent build and not the one it is supposed to be testing. This has regularly caused us considerable confusion as the jobs were out-of-step.

            b) It takes so long ... our project is somewhat monolithic, including all binaries required within the workspace. The workspace.zip archive is about 1.5Gb and creating it takes around 50 minutes using clone-workspace-plugin vs. only 10 minutes for the actual compile, that's unacceptable. Un-zipping at the start of the steps (2) and (3) is not quite so bad but still 10-20 minutes. It seems to be something to do with the way data is piped between master and slave.

            The solution we now have working is to replace this plugin by a shell script. This script takes 2 minutes to archive the workspace at the end of job (1) and 2 minutes at to un-archive it at the start of the other steps. It is attached to this report to illustrate our use case which we would like this plugin to support.

            Use the script as follows:

            At the end of job (1) invoke the script using "Execute Shell":

            ${JENKINS_BIN}/clone-workspace PACK "${BUILD_TAG}"

            Optional INCLUDE= and EXCLUDE= filters can be used to limit the contents of the archive. We actually create 2 archives, one is much smaller containing just the information required for the "promote" step (2a) to work.

            Also add a Post-Build action "Trigger parameterized build on other projects" to start job (2) using the parameter:

            UPSTREAM_BUILD_TAG=${BUILD_TAG}

            Jobs (2) and (3) are parameterized, accepting UPSTREAM_BUILD_TAG. The first task that each of these downstream jobs runs is:

            Execute Shell

            ${JENKINS_BIN}/clone-workspace UNPACK "${UPSTREAM_BUILD_TAG}"

            The archive dir must exist on the master node and be referenced by the JOBS_ARCHIVE environment variable.
            The last job in the pipeline removes the redundant archive by invoking:

            ${JENKINS_BIN}/clone-workspace CLEAN "${UPSTREAM_BUILD_TAG}"

            We also purge all archives from it overnight using a "cron" job, just to be sure.

            Now the downstream jobs don't show the change history; that was solved by re-instating the clone-workspace-plugin but configured to archive just one file.

            Show
            edrandall Ed Randall added a comment - - edited This is our build pipeline: 1) Compile - checks files out of Perforce, cleans, configures for the target architecture and compiles. Archives the workspace for susequent steps in the pipeline. 2) Zip - builds the .ear file, verifies it, builds a complete shippable .zip of all project deliverables. 3) Quick test - runs a relatively short test suite. 2a) Zip is "promoted" if all tests in QuickTest pass. This delivers the Zip to the System test team (outside of Jenkins). Now we didn't want each step to sync the files down from Perforce and compile afresh, the clone-workspace plugin seemed ideal for us. Unfortunately there are a number of problems with it including: a) Only the latest workspace from step(1) is maintained. So if another change triggers a Compile whilst Zip is in progress, it could be that (3) Quick test actually tests that subsequent build and not the one it is supposed to be testing. This has regularly caused us considerable confusion as the jobs were out-of-step. b) It takes so long ... our project is somewhat monolithic, including all binaries required within the workspace. The workspace.zip archive is about 1.5Gb and creating it takes around 50 minutes using clone-workspace-plugin vs. only 10 minutes for the actual compile, that's unacceptable. Un-zipping at the start of the steps (2) and (3) is not quite so bad but still 10-20 minutes. It seems to be something to do with the way data is piped between master and slave. The solution we now have working is to replace this plugin by a shell script. This script takes 2 minutes to archive the workspace at the end of job (1) and 2 minutes at to un-archive it at the start of the other steps. It is attached to this report to illustrate our use case which we would like this plugin to support. Use the script as follows: At the end of job (1) invoke the script using "Execute Shell": ${JENKINS_BIN}/clone-workspace PACK "${BUILD_TAG}" Optional INCLUDE= and EXCLUDE= filters can be used to limit the contents of the archive. We actually create 2 archives, one is much smaller containing just the information required for the "promote" step (2a) to work. Also add a Post-Build action "Trigger parameterized build on other projects" to start job (2) using the parameter: UPSTREAM_BUILD_TAG=${BUILD_TAG} Jobs (2) and (3) are parameterized, accepting UPSTREAM_BUILD_TAG. The first task that each of these downstream jobs runs is: Execute Shell ${JENKINS_BIN}/clone-workspace UNPACK "${UPSTREAM_BUILD_TAG}" The archive dir must exist on the master node and be referenced by the JOBS_ARCHIVE environment variable. The last job in the pipeline removes the redundant archive by invoking: ${JENKINS_BIN}/clone-workspace CLEAN "${UPSTREAM_BUILD_TAG}" We also purge all archives from it overnight using a "cron" job, just to be sure. Now the downstream jobs don't show the change history; that was solved by re-instating the clone-workspace-plugin but configured to archive just one file.
            Hide
            nacht Pedro Rodriguez added a comment -

            I'm getting this slow performance locally (unzipping and zipping locally, no slaves or anything) and it is still very slow (4min for something that takes 1s if done manually).

            Looking at the code, it looks like they are just using the FilePath.zip and tar methods. So not sure if this is an issue in the plugin or in the FilePath.

            Show
            nacht Pedro Rodriguez added a comment - I'm getting this slow performance locally (unzipping and zipping locally, no slaves or anything) and it is still very slow (4min for something that takes 1s if done manually). Looking at the code, it looks like they are just using the FilePath.zip and tar methods. So not sure if this is an issue in the plugin or in the FilePath.
            Hide
            davidschmitt David Schmitt added a comment -

            Here are two logs (with "Elapsed Time" timestamps):

            master log
            00:02:46.164 	 20 Warning(s)
            00:02:46.164 	 0 Error(s)
            00:02:46.164 
            00:02:46.166 Time Elapsed 00:00:11.2877140
            00:02:46.216 Archiving artifacts
            00:02:48.510 Archiving workspace
            00:03:20.968 Deleting old workspace snapshot from #197.
            00:03:22.846 Recording fingerprints
            00:03:54.751 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #173
            00:03:54.751 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #150
            00:03:54.755 Finished: SUCCESS
            
            slave log
            00:02:58.645     0 Warning(s)
            00:02:58.645     0 Error(s)
            00:02:58.645 
            00:02:58.645 Time Elapsed 00:00:09.07
            00:02:58.899 Archiving artifacts
            00:04:05.101 Archiving workspace
            00:05:38.241 Deleting old workspace snapshot from #197.
            00:05:38.261 Recording fingerprints
            00:05:44.987 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #172
            00:05:45.014 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #149
            00:05:45.018 Finished: SUCCESS
            

            master and slave are virtual machines running on the same host, communicating directly via the jnlp started service.

            The master is running Debian GNU/Linux 7.2 (wheezy) and the slave is a Windows 7 box. Jenkins version is 1.537

            As you can see from the first few lines, the compilation power of both machines is similar. The big difference is in the archiving steps: local on master it takes 2.2 seconds for Archiving artifacts and ~30s for "Archiving workspace" which are both around 800MB.

            Show
            davidschmitt David Schmitt added a comment - Here are two logs (with "Elapsed Time" timestamps): master log 00:02:46.164 20 Warning(s) 00:02:46.164 0 Error(s) 00:02:46.164 00:02:46.166 Time Elapsed 00:00:11.2877140 00:02:46.216 Archiving artifacts 00:02:48.510 Archiving workspace 00:03:20.968 Deleting old workspace snapshot from #197. 00:03:22.846 Recording fingerprints 00:03:54.751 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #173 00:03:54.751 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #150 00:03:54.755 Finished: SUCCESS slave log 00:02:58.645 0 Warning(s) 00:02:58.645 0 Error(s) 00:02:58.645 00:02:58.645 Time Elapsed 00:00:09.07 00:02:58.899 Archiving artifacts 00:04:05.101 Archiving workspace 00:05:38.241 Deleting old workspace snapshot from #197. 00:05:38.261 Recording fingerprints 00:05:44.987 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #172 00:05:45.014 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #149 00:05:45.018 Finished: SUCCESS master and slave are virtual machines running on the same host, communicating directly via the jnlp started service. The master is running Debian GNU/Linux 7.2 (wheezy) and the slave is a Windows 7 box. Jenkins version is 1.537 As you can see from the first few lines, the compilation power of both machines is similar. The big difference is in the archiving steps: local on master it takes 2.2 seconds for Archiving artifacts and ~30s for "Archiving workspace" which are both around 800MB.
            Hide
            sschuberth Sebastian Schuberth added a comment -

            A simple way to improve this would probably be to optionally specify native tar / gzip executables that the plugin would use instead of doing the compression in Java code.

            Another more complex idea would be to implement something like copy-on-write semantics.

            Show
            sschuberth Sebastian Schuberth added a comment - A simple way to improve this would probably be to optionally specify native tar / gzip executables that the plugin would use instead of doing the compression in Java code. Another more complex idea would be to implement something like copy-on-write semantics.
            Hide
            davidschmitt David Schmitt added a comment - - edited

            Can someone with insight into the code please check whether this is not caused by something trivial like kohsuke described in jenkins-ci.org/content/maven-job-type-performance-improvements-maven-plugin-22 ?

            Show
            davidschmitt David Schmitt added a comment - - edited Can someone with insight into the code please check whether this is not caused by something trivial like kohsuke described in jenkins-ci.org/content/maven-job-type-performance-improvements-maven-plugin-22 ?

              People

              • Assignee:
                abayer Andrew Bayer
                Reporter:
                edrandall Ed Randall
              • Votes:
                13 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated: