Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: mercurial-plugin
    • Labels:
      None
    • Environment:
      Platform: All, OS: All
    • Similar Issues:

      Description

      For Hudson installations that have a lot of jobs all running off one (or a small
      number) of Mercurial repositories, it is inefficient to have them all pull over
      the network, as they will be repeatedly pulling the exact same changesets. The
      situation is even worse when you consider that polling effectively pulls in
      changesets as well (just discarding them after logging their metadata).

      Suggest a new special job type, Mercurial Cache, which would have attributes:

      1. List of repository URLs.

      2. Optional schedule, like a project.

      There is a corresponding workspace on the master and possibly on some or all
      slaves. Whenever the scheduler fires or the job is otherwise run (e.g.
      manually), the following actions will be taken:

      1. For each repo, if there is a matching cache in the master's workspace, 'hg in
      --bundle incoming.hg && hg pull incoming.hg' to pull all changesets into it.

      2. For each repo and for each slave, if the slave's workspace also contains that
      repo, send incoming.hg to the slave (over the usual channel) and have the slave
      'hg unbundle' it.

      Whenever a project using Mercurial SCM with a matching repository location is
      run or does polling:

      1. If on the master, quietly swap in the local cache repo location for all Hg
      operations that would normally use the remote repo URL (I think this is always
      'hg incoming' in some variant). Note that this means sharing hardlinks in most
      cases. If the cache repo does not yet exist, 'hg clone -U' it and then proceed.

      2. If on a slave, swap in the local (slave) cache repo location. If it does not
      yet exist on the slave, run 'hg bundle --all' on the master, send to the slave
      over the channel, and 'hg init && hg unbundle ...' to create a clone. If it does
      not yet exist on the master, clone it as in #1.

      There needs to be some synchronization so that master and slave caches remain in
      lockstep.

      No configuration for named branches in the caches; only complete repositories
      are cached. Projects using branches will still only pull that branch from the
      cache. The cache does not keep a checkout ("working copy") so no configuration
      needed for that either.

      One possible side benefit of this setup is that the slave does not perform any
      network operations except over its channel to the master. Providing that the
      project build does not perform any network operations, you could then have a
      slave with no internet connection: the master does all pulls from the remote
      repository.

        Attachments

          Activity

          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : jglick
          Path:
          trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/Cacher.java
          trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialSCM.java
          http://jenkins-ci.org/commit/28905
          Log:
          [FIXED JENKINS-4794] Implemented caching on slaves as well as master.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : jglick Path: trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/Cacher.java trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialSCM.java http://jenkins-ci.org/commit/28905 Log: [FIXED JENKINS-4794] Implemented caching on slaves as well as master.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : jglick
          Path:
          trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/Cacher.java
          trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialInstallation.java
          trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialSCM.java
          trunk/hudson/plugins/mercurial/src/main/resources/hudson/plugins/mercurial/MercurialInstallation/config.jelly
          trunk/hudson/plugins/mercurial/src/main/resources/hudson/plugins/mercurial/MercurialInstallation/help-useCaches.html
          trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/CacherTest.java
          trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/CachingSCMTest.java
          trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/DebugFlagTest.java
          trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/ForestTest.java
          http://jenkins-ci.org/commit/28846
          Log:
          JENKINS-4794 Started work on Mercurial repository cache.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : jglick Path: trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/Cacher.java trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialInstallation.java trunk/hudson/plugins/mercurial/src/main/java/hudson/plugins/mercurial/MercurialSCM.java trunk/hudson/plugins/mercurial/src/main/resources/hudson/plugins/mercurial/MercurialInstallation/config.jelly trunk/hudson/plugins/mercurial/src/main/resources/hudson/plugins/mercurial/MercurialInstallation/help-useCaches.html trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/CacherTest.java trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/CachingSCMTest.java trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/DebugFlagTest.java trunk/hudson/plugins/mercurial/src/test/java/hudson/plugins/mercurial/ForestTest.java http://jenkins-ci.org/commit/28846 Log: JENKINS-4794 Started work on Mercurial repository cache.
          Hide
          jglick Jesse Glick added a comment -
          Show
          jglick Jesse Glick added a comment - http://mercurial.selenic.com/bts/issue1910 tracks the Hg bug.
          Hide
          jglick Jesse Glick added a comment -

          While broadcasting incoming.hg to all slaves ought to be reliable in principle
          (since their caches should never be doing anything besides pulling from master),
          this might run into trouble if some slaves went offline and missed some earlier
          updates, etc. A more robust way to push changesets over a Hudson channel is
          using file transfer:

          hg -R repo-a bundle `hg -R repo-b heads --template ' --base

          {node}

          '` /tmp/xfer.hg
          hg -R repo-b unbundle /tmp/xfer.hg

          This style has the advantage that slave caches can be updated lazily, since
          there is no requirement that all slaves have been updated to the same point:
          when running a Hg operation on a slave, simply pull on master cache, then update
          that slave cache, then continue.

          (There seems to be a bug in bundle: --base on a head revision does not prevent
          that revision from being included, though it excludes its ancestors. So maybe
          need to also run heads on repo-a and filter repo-b's list to avoid transmitting
          extra changesets. Anyway this would be useful since if the filtered list is
          empty, can avoid running any commands: repos are already in synch.)

          Show
          jglick Jesse Glick added a comment - While broadcasting incoming.hg to all slaves ought to be reliable in principle (since their caches should never be doing anything besides pulling from master), this might run into trouble if some slaves went offline and missed some earlier updates, etc. A more robust way to push changesets over a Hudson channel is using file transfer: hg -R repo-a bundle `hg -R repo-b heads --template ' --base {node} '` /tmp/xfer.hg hg -R repo-b unbundle /tmp/xfer.hg This style has the advantage that slave caches can be updated lazily, since there is no requirement that all slaves have been updated to the same point: when running a Hg operation on a slave, simply pull on master cache, then update that slave cache, then continue. (There seems to be a bug in bundle: --base on a head revision does not prevent that revision from being included, though it excludes its ancestors. So maybe need to also run heads on repo-a and filter repo-b's list to avoid transmitting extra changesets. Anyway this would be useful since if the filtered list is empty, can avoid running any commands: repos are already in synch.)
          Hide
          jglick Jesse Glick added a comment -

          Interaction with Forest extension (issue #1143) may be problematic. For
          simplicity would probably just disable cache usage from projects using Forest.

          Show
          jglick Jesse Glick added a comment - Interaction with Forest extension (issue #1143) may be problematic. For simplicity would probably just disable cache usage from projects using Forest.
          Hide
          jglick Jesse Glick added a comment -

          Dianna DeCristo writes:
          "This performance improvement would really help us if it worked with the
          multi-config projects so is it be possible to not make it a new project type but
          rather a configuration option for a project? The master Mercurial cache could
          be configured through the Hudson Master Configure System link first and then
          assigned to each project in its own configuration.

          Our single Mercurial repository has a 150,000 files and we build off four
          different branches across 4 different platforms using the multi-config project.
          I have clusters of slaves so that we can build the branches in parallel. The
          time spent cloning and pulling kills us."

          The special job type would work equally well for this setup because the cache
          job would be separate from your regular project - freestyle, Maven 2, matrix,
          whatever. You have one cache job on the server, and any projects which use
          Mercurial as their SCM and request matching repository locations will
          automatically employ the cache.

          A refinement to my initial proposal would be to make the cache configuration
          just be part of global Hudson config, not a job at all, and with no schedule.
          Whenever any job, through its MercurialSCM, requested access to any of these
          repos - whether for 'hg incoming' or during a build - the cache would first be
          created or updated. To simplify administration, you could even omit the list of
          repositories to cache and simply cache any remote repository that was
          encountered by any job, turning the cache configuration into a single
          checkbox...though in this case some scheme for discarding cached repos not in
          use for a long time would be useful.

          Show
          jglick Jesse Glick added a comment - Dianna DeCristo writes: "This performance improvement would really help us if it worked with the multi-config projects so is it be possible to not make it a new project type but rather a configuration option for a project? The master Mercurial cache could be configured through the Hudson Master Configure System link first and then assigned to each project in its own configuration. Our single Mercurial repository has a 150,000 files and we build off four different branches across 4 different platforms using the multi-config project. I have clusters of slaves so that we can build the branches in parallel. The time spent cloning and pulling kills us." The special job type would work equally well for this setup because the cache job would be separate from your regular project - freestyle, Maven 2, matrix, whatever. You have one cache job on the server, and any projects which use Mercurial as their SCM and request matching repository locations will automatically employ the cache. A refinement to my initial proposal would be to make the cache configuration just be part of global Hudson config, not a job at all, and with no schedule. Whenever any job, through its MercurialSCM, requested access to any of these repos - whether for 'hg incoming' or during a build - the cache would first be created or updated. To simplify administration, you could even omit the list of repositories to cache and simply cache any remote repository that was encountered by any job, turning the cache configuration into a single checkbox...though in this case some scheme for discarding cached repos not in use for a long time would be useful.

            People

            • Assignee:
              jglick Jesse Glick
              Reporter:
              jglick Jesse Glick
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: