Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-32246

Is it absolutely necessary to keep full .git repository in the sub-projects?

    Details

    • Similar Issues:

      Description

      Original reporter Ivan Anishchuk

      Can't we use, say, just one git directory for all of them? (Making them multiple working directories for a single git repository.) It would save a lot of disk space and internet traffic.

      At the very least, maybe we can use master-project's cloned repository as a remote for sub-projects? Local git clone uses hardlinks so the disk space and traffic would be preserved without changing how it looks inside each workdir.

      More info on SO: http://stackoverflow.com/questions/6270193/multiple-working-directories-with-git

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Not clear to me what this issue is referring to. What are the “master project” and “subprojects” in this context? If there is a concrete problem, please provide steps to reproduce it from scratch.

            And why is this filed in scm-api-plugin and branch-api-plugin when the description seems specific to git-plugin?

            Show
            jglick Jesse Glick added a comment - Not clear to me what this issue is referring to. What are the “master project” and “subprojects” in this context? If there is a concrete problem, please provide steps to reproduce it from scratch. And why is this filed in scm-api-plugin and branch-api-plugin when the description seems specific to git-plugin ?
            Hide
            seanf Sean Flanigan added a comment -

            I think this is about the fact that a multibranch project (or GitHub Organization project) checks out the same repo for each branch or pull request multiple times, apparently without making use of git's ability to share hardlinks between local git clones.

            I'm more familiar with GitHub Organization jobs, but I think it would be similar with Multibranch. Please forgive me if I use some of the wrong terms.

            1. User creates a GitHub Organization job folder
            2. Jenkins creates a GitHub repository folder for each repo
            3. Jenkins creates one pipeline job for each branch or pull request, within that repo, which contains a Jenkinsfile (depending on the settings of the Organization job folder)

            Every time a node is allocated by a pipeline job and runs checkout scm, it appears to clone a copy of the repository directly from GitHub, thus using disk space (and bandwidth) proportional to the number of branches.

            I think the idea is that the primary git clone on the node should be associated with the repository as a whole (#2), and then the individual branch jobs (#3) would use a local git clone (git clone ../primaryGitClone), thus taking advantage of hardlinks for the files under .git/objects/. It would also be possible for these local git clones to fetch from the primary git clone to save network traffic.

            A more advanced option would be to use git worktree so that there is actually a single git clone with multiple working directories, although the limitations of git worktree could make this tricky.

            Show
            seanf Sean Flanigan added a comment - I think this is about the fact that a multibranch project (or GitHub Organization project) checks out the same repo for each branch or pull request multiple times, apparently without making use of git's ability to share hardlinks between local git clones. I'm more familiar with GitHub Organization jobs, but I think it would be similar with Multibranch. Please forgive me if I use some of the wrong terms. 1. User creates a GitHub Organization job folder 2. Jenkins creates a GitHub repository folder for each repo 3. Jenkins creates one pipeline job for each branch or pull request, within that repo, which contains a Jenkinsfile (depending on the settings of the Organization job folder) Every time a node is allocated by a pipeline job and runs checkout scm , it appears to clone a copy of the repository directly from GitHub, thus using disk space (and bandwidth) proportional to the number of branches. I think the idea is that the primary git clone on the node should be associated with the repository as a whole (#2), and then the individual branch jobs (#3) would use a local git clone (git clone ../primaryGitClone), thus taking advantage of hardlinks for the files under .git/objects/ . It would also be possible for these local git clones to fetch from the primary git clone to save network traffic. A more advanced option would be to use git worktree so that there is actually a single git clone with multiple working directories, although the limitations of git worktree could make this tricky.
            Hide
            jglick Jesse Glick added a comment -

            Not intending to solve it that way; see duplicate.

            Show
            jglick Jesse Glick added a comment - Not intending to solve it that way; see duplicate.

              People

              • Assignee:
                Unassigned
                Reporter:
                mjdetullio Matthew DeTullio
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: