Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63428

fatal: index-pack failed during git fetch

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Component/s: git-client-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.195
      Git client plugin ver. 3.2.1
      Git plugin ver. 4.2.2
    • Similar Issues:

      Description

      Folks,

      Please help me with advice on how to fix this issue in our environments. It is happening all the time.

      When we attempt checkout using the jenkins plugin, we are seeing this error:

      [2020-08-17T21:25:43.615Z] Cloning repository http://irepo.eur.ad.sag/scm/tsm/um-test.git
      [2020-08-17T21:25:43.650Z] > git init /home/vmtest/workspace/10.7_git_release_pipeline_test_java_units # timeout=10
      [2020-08-17T21:25:43.839Z] Fetching upstream changes from http://irepo.eur.ad.sag/scm/tsm/um-test.git
      [2020-08-17T21:25:43.839Z] > git --version # timeout=10
      [2020-08-17T21:25:43.843Z] > git fetch --no-tags --force --progress --depth=1 – http://irepo.eur.ad.sag/scm/tsm/um-test.git +refs/heads/:refs/remotes/origin/ # timeout=20
      [2020-08-17T21:45:43.934Z] ERROR: Error cloning remote repo 'origin'
      [2020-08-17T21:45:43.934Z] hudson.plugins.git.GitException: Command "git fetch --no-tags --force --progress --depth=1 – http://irepo.eur.ad.sag/scm/tsm/um-test.git +refs/heads/:refs/remotes/origin/" returned status code 128:
      [2020-08-17T21:45:43.934Z] stdout:
      [2020-08-17T21:45:43.934Z] stderr: remote: Enumerating objects: 15448, done.
      [2020-08-17T21:45:43.934Z] remote: Counting objects: 0% (1/15448)
      remote: Counting objects: 1% (155/15448)
      remote: Counting objects: 2% (309/15448)
      remote: Counting objects: 3% (464/15448)

       

      And then it resumes work, but fails in the end with:

      Receiving objects:  69% (10704/15448), 208.76 MiB | 168.00 KiB/sReceiving objects:  69% (10704/15448), 208.76 MiB | 168.00 KiB/sReceiving objects:  69% (10704/15448), 208.92 MiB | 172.00 KiB/s*Receiving objects:  69% (10704/15448), 208.99 MiB | 169.00 KiB/serror: --shallow-file died of signal 15[2020-08-17T23:05:45.622Z] fatal: index-pack failed*[2020-08-17T23:05:45.622Z] [2020-08-17T23:05:45.623Z] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2430)[2020-08-17T23:05:45.623Z] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2044)[2020-08-17T23:05:45.623Z] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:81)

      Attaching the full console ouput from a linux test node in linux_console_output.txt

      Attaching the full console output from a windows node in windows_console_output.txt

      The errors both on windows and linux appear to be the same to me. 

      When I manually perform the same commands on the same Jenkins nodes, it all works just fine and quick.

      Looking for suggestions how to workaround/resolve this issue

        Attachments

          Activity

          Hide
          vassilena Vassilena Treneva added a comment -

          (SIDE NOTE)

          I made some attempts to collect more logs by adding GIT_TRACE_PACKET=1, GIT_TRACE=1, GIT_CURL_VERBOSE=1 in the checkout step (as advised here:  https://confluence.atlassian.com/bitbucketserverkb/git-clone-fails-fatal-the-remote-end-hung-up-unexpectedly-fatal-early-eof-fatal-index-pack-failed-779171803.html)I made some attempts to collect more logs by adding GIT_TRACE_PACKET=1, GIT_TRACE=1, GIT_CURL_VERBOSE=1 in the checkout step (as advised here:  https://confluence.atlassian.com/bitbucketserverkb/git-clone-fails-fatal-the-remote-end-hung-up-unexpectedly-fatal-early-eof-fatal-index-pack-failed-779171803.html)

          But this did not help – no additional information was collected (I wrapped my checkout step in withEnv block and also tried plain sh/bat commands). Since this did not work, I assumed the plugin requires some different logging configuration and enabled the loggers on Jenkins level under Manage Jenkins -> System Log -> Log Recorders. Added these:

          hudson.plugins.git

          hudson.plugins.git.GitChangeSet

          hudson.plugins.git.GitSCM

          hudson.plugins.git.GitStatus

          hudson.plugins.git.GitTool

          hudson.plugins.git.util

          hudson.plugins.git.util.BuildData

          hudson.plugins.git.util.GitUtils

          org.jenkinsci.plugins.gitclient.Git

          Unfortunately, these do not collect any info as well. Probably not the right configuration.

          Show
          vassilena Vassilena Treneva added a comment - (SIDE NOTE) I made some attempts to collect more logs by adding GIT_TRACE_PACKET=1, GIT_TRACE=1, GIT_CURL_VERBOSE=1 in the checkout step (as advised here:  https://confluence.atlassian.com/bitbucketserverkb/git-clone-fails-fatal-the-remote-end-hung-up-unexpectedly-fatal-early-eof-fatal-index-pack-failed-779171803.html)I made some attempts to collect more logs by adding GIT_TRACE_PACKET=1, GIT_TRACE=1, GIT_CURL_VERBOSE=1 in the checkout step (as advised here:  https://confluence.atlassian.com/bitbucketserverkb/git-clone-fails-fatal-the-remote-end-hung-up-unexpectedly-fatal-early-eof-fatal-index-pack-failed-779171803.html) But this did not help – no additional information was collected (I wrapped my checkout step in withEnv block and also tried plain sh/bat commands). Since this did not work, I assumed the plugin requires some different logging configuration and enabled the loggers on Jenkins level under Manage Jenkins -> System Log -> Log Recorders. Added these: hudson.plugins.git hudson.plugins.git.GitChangeSet hudson.plugins.git.GitSCM hudson.plugins.git.GitStatus hudson.plugins.git.GitTool hudson.plugins.git.util hudson.plugins.git.util.BuildData hudson.plugins.git.util.GitUtils org.jenkinsci.plugins.gitclient.Git Unfortunately, these do not collect any info as well. Probably not the right configuration.
          Hide
          markewaite Mark Waite added a comment -

          The timestamps in the log file indicate that your repository is so large or your network bandwidth so small that your repository clone is reaching the 20 minute timeout that is defined in your job.

          Refer to Git in the Large for recommendations to more effectively manage large repositories. The techniques that have been most effective for me have included:

          1. Create a bare copy of the repository on the agent and use it as a reference repository in the job definition
          2. Use a narrow refspec to clone only the exact branch needed for the job
          3. Use a shallow clone to reduce the amount of history copied into the workspace
          Show
          markewaite Mark Waite added a comment - The timestamps in the log file indicate that your repository is so large or your network bandwidth so small that your repository clone is reaching the 20 minute timeout that is defined in your job. Refer to Git in the Large for recommendations to more effectively manage large repositories. The techniques that have been most effective for me have included: Create a bare copy of the repository on the agent and use it as a reference repository in the job definition Use a narrow refspec to clone only the exact branch needed for the job Use a shallow clone to reduce the amount of history copied into the workspace
          Hide
          vassilena Vassilena Treneva added a comment -

          Mark Waite,

           

          I am already using a shallow checkout and I am getting this error with the shallow setting in place.

          I see what you mean about the timeout and this is exactly what I find wrong - executing the same command-line arguments on the same machine manually provide a much faster response:

           

          [vmtest@sofumrhel11 test]$ git fetch --no-tags --force --progress --depth=1 – http://irepo.eur.ad.sag/scm/tsm/um-test.git +refs/heads/:refs/remotes/origin/
          remote: Enumerating objects: 15448, done.
          remote: Counting objects: 100% (15448/15448), done.
          remote: Compressing objects: 100% (9947/9947), done.
          remote: Total 15448 (delta 6151), reused 9545 (delta 4799)
          Receiving objects: 100% (15448/15448), 331.94 MiB | 1.90 MiB/s, done.
          Resolving deltas: 100% (6151/6151), done.
          From http://irepo.eur.ad.sag/scm/tsm/um-test

          • [new branch] NUM-13293-ConsumerManager -> origin/NUM-13293-ConsumerManager
          • [new branch] NUM-13294-test -> origin/NUM-13294-test
          • [new branch] master -> origin/master
          • [new branch] vasi-test -> origin/vasi-test

           

          This part, where "Counting objects" is listed, takes a lot of time when doing the checkout using the plugin, while it takes a minute or two max when doing it manually. 

          Here is the output when doing it with the plugin:

           

          [2020-08-17T21:45:43.934Z] stdout:
          [2020-08-17T21:45:43.934Z] stderr: remote: Enumerating objects: 15448, done.
          [2020-08-17T21:45:43.934Z] remote: Counting objects: 0% (1/15448)
          remote: Counting objects: 1% (155/15448)
          remote: Counting objects: 2% (309/15448)
          remote: Counting objects: 3% (464/15448)
          remote: Counting objects: 4% (618/15448)
          remote: Counting objects: 5% (773/15448)
          remote: Counting objects: 6% (927/15448)
          remote: Counting objects: 7% (1082/15448)
          remote: Counting objects: 8% (1236/15448)
          remote: Counting objects: 9% (1391/15448)
          remote: Counting objects: 10% (1545/15448)
          remote: Counting objects: 11% (1700/15448)
          remote: Counting objects: 12% (1854/15448)
          remote: Counting objects: 13% (2009/15448)
          remote: Counting objects: 14% (2163/15448)

           

          If it was the repo size (although the size is big, I admit) I would be getting the same error manually.

          Show
          vassilena Vassilena Treneva added a comment - Mark Waite ,   I am already using a shallow checkout and I am getting this error with the shallow setting in place. I see what you mean about the timeout and this is exactly what I find wrong - executing the same command-line arguments on the same machine manually provide a much faster response:   [vmtest@sofumrhel11 test] $ git fetch --no-tags --force --progress --depth=1 – http://irepo.eur.ad.sag/scm/tsm/um-test.git +refs/heads/ :refs/remotes/origin/ remote: Enumerating objects: 15448, done. remote: Counting objects: 100% (15448/15448), done. remote: Compressing objects: 100% (9947/9947), done. remote: Total 15448 (delta 6151), reused 9545 (delta 4799) Receiving objects: 100% (15448/15448), 331.94 MiB | 1.90 MiB/s, done. Resolving deltas: 100% (6151/6151), done. From http://irepo.eur.ad.sag/scm/tsm/um-test [new branch] NUM-13293-ConsumerManager -> origin/NUM-13293-ConsumerManager [new branch] NUM-13294-test -> origin/NUM-13294-test [new branch] master -> origin/master [new branch] vasi-test -> origin/vasi-test   This part, where " Counting objects " is listed, takes a lot of time when doing the checkout using the plugin, while it takes a minute or two max when doing it manually.  Here is the output when doing it with the plugin:   [2020-08-17T21:45:43.934Z] stdout: [2020-08-17T21:45:43.934Z] stderr: remote: Enumerating objects: 15448, done. [2020-08-17T21:45:43.934Z] remote: Counting objects: 0% (1/15448) remote: Counting objects: 1% (155/15448) remote: Counting objects: 2% (309/15448) remote: Counting objects: 3% (464/15448) remote: Counting objects: 4% (618/15448) remote: Counting objects: 5% (773/15448) remote: Counting objects: 6% (927/15448) remote: Counting objects: 7% (1082/15448) remote: Counting objects: 8% (1236/15448) remote: Counting objects: 9% (1391/15448) remote: Counting objects: 10% (1545/15448) remote: Counting objects: 11% (1700/15448) remote: Counting objects: 12% (1854/15448) remote: Counting objects: 13% (2009/15448) remote: Counting objects: 14% (2163/15448)   If it was the repo size (although the size is big, I admit) I would be getting the same error manually.
          Hide
          markewaite Mark Waite added a comment -

          Are you sure that you're executing the exact same commands interactively on the exact agent that is showing the issue? The git commands that Jenkins uses are listed in the output.

          If you're not using a reference repository, you will see significant improvement in download bandwidth and download time by using a reference repository.

          Show
          markewaite Mark Waite added a comment - Are you sure that you're executing the exact same commands interactively on the exact agent that is showing the issue? The git commands that Jenkins uses are listed in the output. If you're not using a reference repository, you will see significant improvement in download bandwidth and download time by using a reference repository.
          Hide
          vassilena Vassilena Treneva added a comment -

          I am certain that the commands are the same as the ones that the plugin executes.  I took them from the console output.

           

          After playing a bit with a reproduction test job I think we can confirm that this only happens when we have multiple concurrent checkouts on different Jenkins nodes. If we execute one or two concurrent checkouts on a different node, we do not see the issue, but if we spin up more nodes, which is what we usually do in our infra, we immediately hit the issue.

           

          Perhaps the GIT server has some performance issue – not sure yet, looking for logs on the server side.

          Show
          vassilena Vassilena Treneva added a comment - I am certain that the commands are the same as the ones that the plugin executes.  I took them from the console output.   After playing a bit with a reproduction test job I think we can confirm that this only happens when we have multiple concurrent checkouts on different Jenkins nodes . If we execute one or two concurrent checkouts on a different node, we do not see the issue, but if we spin up more nodes, which is what we usually do in our infra, we immediately hit the issue.   Perhaps the GIT server has some performance issue – not sure yet, looking for logs on the server side.
          Hide
          vassilena Vassilena Treneva added a comment - - edited

          Once I increased the clone/checkout timeouts my checkouts started working. Taking too much time but working. 

          After doing several different tests it turns out this is network related - looks like concurrent checkout clients are blocked somehow and this slows down the operation

          Our Bitbucket server is in another country and when we attempted the same number of concurrent checkouts on infra in that same country (thus avoiding the network element) we managed to get quick checkouts)

          The solution for us would be to create a Bitbucket server mirror in our country.  

          Show
          vassilena Vassilena Treneva added a comment - - edited Once I increased the clone/checkout timeouts my checkouts started working. Taking too much time but working.  After doing several different tests it turns out this is network related - looks like concurrent checkout clients are blocked somehow and this slows down the operation Our Bitbucket server is in another country and when we attempted the same number of concurrent checkouts on infra in that same country (thus avoiding the network element) we managed to get quick checkouts) The solution for us would be to create a Bitbucket server mirror in our country.  
          Hide
          markewaite Mark Waite added a comment -

          Thanks for providing the update!

          Bitbucket offers a server mirroring technology that looked very promising to me. If that's not workable, you might consider installing a local Gitea server that periodically refreshes from the upstream Bitbucket repository and is used in job definitions that contain two repositories, one the Gitea repository and one the Bitbucket repository. Jenkins jobs can have multiple repositories defined in a single job. It is a rarely used feature because there are several problems that can happen when using that technique, but it might be a short term solution while you get the budget for a Bitbucket mirror.

          Show
          markewaite Mark Waite added a comment - Thanks for providing the update! Bitbucket offers a server mirroring technology that looked very promising to me. If that's not workable, you might consider installing a local Gitea server that periodically refreshes from the upstream Bitbucket repository and is used in job definitions that contain two repositories, one the Gitea repository and one the Bitbucket repository. Jenkins jobs can have multiple repositories defined in a single job. It is a rarely used feature because there are several problems that can happen when using that technique, but it might be a short term solution while you get the budget for a Bitbucket mirror.

            People

            • Assignee:
              Unassigned
              Reporter:
              vassilena Vassilena Treneva
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: