Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56326

Git fetch in a later Pipeline stage fails when master received new revisions in the meantime

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Fixed but Unreleased (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Component/s: git-plugin, pipeline
    • Labels:
      None
    • Environment:
      This _seems_ independent of configuration specifics, so please let me know what you need before I perform the marathon and involve our admins.
    • Similar Issues:

      Description

      Here is a timeline of events.

      1. Revision 08979f16 on branch master pushed.
      2. Revision 29f59aa1 on branch hsp-2917 pushed.
      3. Multi-branch pipeline starts build for hsp-2917 as expected:
        Cloning the remote Git repository
        Cloning with configured refspecs honoured and without tags
        Cloning repository <redacted>
         > git init /build/workspace/<redacted>_hsp-2917 # timeout=10
        Fetching upstream changes from <redacted>
         > git --version # timeout=10
        using GIT_SSH to set credentials 
         > git fetch --no-tags --progress <redacted> +refs/heads/*:refs/remotes/origin/*
        Fetching without tags
        Checking out Revision 29f59aa14a5f70d2977315e4ec18d7fa6ed5dc34 (hsp-2917)
         > git config remote.origin.url <redacted> # timeout=10
         > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
         > git config remote.origin.url <redacted> # timeout=10
        Fetching upstream changes from <redacted>
        using GIT_SSH to set credentials 
         > git fetch --no-tags --progress <redacted> +refs/heads/*:refs/remotes/origin/*
         > git config core.sparsecheckout # timeout=10
         > git checkout -f 29f59aa14a5f70d2977315e4ec18d7fa6ed5dc34
        First time build. Skipping changelog.

        The build runs as per usual.

      4. New revisions 1c65836c and 54154615 are pushed to branch master.

      Expected: The build started in 3. is not affected by 4.

      Actual: A later stage (after length test stages) running on a different agent fails:

       > git fetch --no-tags --progress <redacted> +refs/heads/*:refs/remotes/origin/*
      ERROR: Error fetching remote repo 'origin'
      hudson.plugins.git.GitException: Failed to fetch from <redacted>
      	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:894)
      	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1161)
      	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1192)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:90)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:77)
      	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.plugins.git.GitException: Command "git fetch --no-tags --progress <redacted> +refs/heads/*:refs/remotes/origin/*" returned status code 1:
      stdout: 
      stderr: error: cannot lock ref 'refs/remotes/origin/master': ref refs/remotes/origin/master is at 54154615eb893f5f4f5e51107afd5aac296405af but expected 08979f16cf724f022852eeba8bbd8b3fad599cad
      From <redacted>
       ! 08979f1..5415461  master     -> origin/master  (unable to update local ref)
      

      We already know which we revision we want and should just check out that one. Newer revisions should not interfere with the build, no matter which branch they are on.
      (This didn't happen here, but the branch may even have been deleted, in which case the build should still work, assuming the revision under build is still there.)

        Attachments

          Activity

          Hide
          markewaite Mark Waite added a comment -

          Thanks for the report. You're correct that it is expected that repeated calls to checkout on the same repository within the same pipeline are expected to use the same SHA1 hash for that checkout.

          The error message reported in the stack trace is from a command line git failure, not from a specific failure inside the git plugin. It may be that the git plugin is passing incorrect arguments to command line git, or it may be that there is something specific related to the version of command line git being used on the agent.

          I won't be able to spend significant time trying to duplicate this bug in my environment today or probably even tomorrow. Weekend may allow me limited time to explore it.

          Like you, I assume that the issue is not related to any specific configuration, though it would help to know the version of command line git that is running on the agent and the version of command line git that is running on the master, just in case those versions become relevant.

          Show
          markewaite Mark Waite added a comment - Thanks for the report. You're correct that it is expected that repeated calls to checkout on the same repository within the same pipeline are expected to use the same SHA1 hash for that checkout. The error message reported in the stack trace is from a command line git failure, not from a specific failure inside the git plugin. It may be that the git plugin is passing incorrect arguments to command line git, or it may be that there is something specific related to the version of command line git being used on the agent. I won't be able to spend significant time trying to duplicate this bug in my environment today or probably even tomorrow. Weekend may allow me limited time to explore it. Like you, I assume that the issue is not related to any specific configuration, though it would help to know the version of command line git that is running on the agent and the version of command line git that is running on the master, just in case those versions become relevant.
          Hide
          reitzig Raphael Reitzig added a comment - - edited

          I see multiple git --version calls in the log, but unfortunately the output is not included (as you can see above). I'll ask a person with shell access to the resp. machines. Thanks!

          Show
          reitzig Raphael Reitzig added a comment - - edited I see multiple git --version calls in the log, but unfortunately the output is not included (as you can see above). I'll ask a person with shell access to the resp. machines. Thanks!
          Hide
          reitzig Raphael Reitzig added a comment -

          We have the following potpourri of Git versions:

          • master: git version 2.17.1 (Ubuntu 18.04)
          • agent A: git version 2.11.0 (Debian stretch)
          • agent B: git version 2.7.4 (Ubuntu 16.04)
          • agent C: git version 2.7.4 (Ubuntu 16.04)
          • agent D: git version 2.7.4 (Ubuntu 16.04)
          • agent E: git version 2.17.1 (Ubuntu 18.04)

          The error occurred on agent D. Parallel stages on agents A, C, and E completed successfully.

          Actually, C and D are the same host which appears in the executor list twice (with different names).

          Show
          reitzig Raphael Reitzig added a comment - We have the following potpourri of Git versions: master: git version 2.17.1 (Ubuntu 18.04) agent A: git version 2.11.0 (Debian stretch) agent B: git version 2.7.4 (Ubuntu 16.04) agent C: git version 2.7.4 (Ubuntu 16.04) agent D: git version 2.7.4 (Ubuntu 16.04) agent E: git version 2.17.1 (Ubuntu 18.04) The error occurred on agent D. Parallel stages on agents A, C, and E completed successfully. Actually, C and D are the same host which appears in the executor list twice (with different names).
          Hide
          markewaite Mark Waite added a comment -

          If the error occurs on agent D and it is the same host as agent C, are those two agents using a shared directory for their workspace? The command line git message that it cannot lock a ref might indicate that two different processes are operating in the same directory.

          Show
          markewaite Mark Waite added a comment - If the error occurs on agent D and it is the same host as agent C, are those two agents using a shared directory for their workspace? The command line git message that it cannot lock a ref might indicate that two different processes are operating in the same directory.
          Hide
          markewaite Mark Waite added a comment - - edited

          I'm unable to duplicate the problem. Refer to the Jenkinsfile in my JENKINS-56326 bug repro branch to see the code that I used to attempt to duplicate it. The duplication performs the following steps:

          1. Commit a change to the JENKINS-56326 branch
          2. Launch a multibranch Pipeline job to build that change
          3. Job performs a checkout, reports the SHA1 of the checkout, sleeps 90 seconds, performs another checkout in a separate workspace, then reports SHA1 of checkout in new workspace
          4. While job is running (during the 90 second sleep), the "increment" target that launched the job sleeps 53 seconds, commits to the master branch, and pushes that commit

          The check for duplication was performed with command line git in multiple environments, including Windows and several different Linux machines. The check was also performed with JGit.

          I am closing this as "cannot reproduce". If you can provide more details so that I can duplicate the problem, please reopen it. Refer to the preceding comment for one possible reason for that error message from command line git.

          Show
          markewaite Mark Waite added a comment - - edited I'm unable to duplicate the problem. Refer to the Jenkinsfile in my JENKINS-56326 bug repro branch to see the code that I used to attempt to duplicate it. The duplication performs the following steps: Commit a change to the JENKINS-56326 branch Launch a multibranch Pipeline job to build that change Job performs a checkout, reports the SHA1 of the checkout, sleeps 90 seconds, performs another checkout in a separate workspace, then reports SHA1 of checkout in new workspace While job is running (during the 90 second sleep), the "increment" target that launched the job sleeps 53 seconds, commits to the master branch, and pushes that commit The check for duplication was performed with command line git in multiple environments, including Windows and several different Linux machines. The check was also performed with JGit. I am closing this as "cannot reproduce". If you can provide more details so that I can duplicate the problem, please reopen it. Refer to the preceding comment for one possible reason for that error message from command line git.
          Hide
          reitzig Raphael Reitzig added a comment -

          If the error occurs on agent D and it is the same host as agent C, are those two agents using a shared directory for their workspace?

          I checked the workspaces list via the Jenkins UI. Agents C and D indeed use the same path (not conclusive) and the content seems to be identical, down to change dates of the files. So yes, this may indeed be the case. Not sure how to get perfect confirmation (again, no admin access on any of those machines) but it seems worth trying out for reproduction.

          That said, would that even be proper usage? Seems to me that using the same workspaces on two different agents is inviting trouble, even though you wouldn't notice unless parallel stages are used.

          Show
          reitzig Raphael Reitzig added a comment - If the error occurs on agent D and it is the same host as agent C, are those two agents using a shared directory for their workspace? I checked the workspaces list via the Jenkins UI. Agents C and D indeed use the same path (not conclusive) and the content seems to be identical, down to change dates of the files. So yes, this may indeed be the case. Not sure how to get perfect confirmation (again, no admin access on any of those machines) but it seems worth trying out for reproduction. That said, would that even be proper usage? Seems to me that using the same workspaces on two different agents is inviting trouble, even though you wouldn't notice unless parallel stages are used.
          Hide
          markewaite Mark Waite added a comment -

          Running two agents on the same computer or same file system with the same directory for both agents is a mistake. It creates many problems as the two agents each assume they have complete control of the workspace.

          If you're allowed to temporarily mark either agent C or agent D as offline, that should be enough to confirm that the problem is not visible when a single agent is controlling the workspace.

          Show
          markewaite Mark Waite added a comment - Running two agents on the same computer or same file system with the same directory for both agents is a mistake. It creates many problems as the two agents each assume they have complete control of the workspace. If you're allowed to temporarily mark either agent C or agent D as offline, that should be enough to confirm that the problem is not visible when a single agent is controlling the workspace.

            People

            • Assignee:
              Unassigned
              Reporter:
              reitzig Raphael Reitzig
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: