Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54610

Race condition when polling for changes triggered by web hook

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • None
    • Jenkins ver. 2.138.2

      (Relevant) Plugins:
       * Pipeline: 2.5
       * Github Branch Source: 2.3.6
       * GitHub: 1.29.2
       * GitHub Pull Request Builder: 1.420
    • branch-api-plugin-2.6.0

      Scenario:

      (According to my understanding. Please correct me if I'm wrong.)

      We're using the GitHub Branch Source plugin to build master branches in our repositories, as well as any PR branches. We're sometimes experiencing this issue with our workflow.

      1. Developer merges a PR into a `master` branch.
      2. GitHub fires a web hook for the repository.
      3. Jenkins polls the repository for updated branches/PRs.
      4. (while #3 is still running) About 3 seconds later, GitHub sends another web hook for the same repository.
      5. Jenkins begins running another poll agains the same repository.
      6. Both polls come to the conclusion that the master branch has been updated, so needs to be built.
      7. Two builds for the master branch are queued.

       

      The Bug

      Kicking off two builds eats a lot of resources. Our `master` branch is part of a much longer CI/CD pipeline, which itself kicks off several parallel jobs which all eat executor slots and builder resources.

      While we can serialize steps with `lock` and `milestone`, (and have done so for the deploy pipeline), that still results in a lot of duplicate work. We would expect only one build to be triggered when merging to master.

      Theories

      A) We're getting one web hook regarding closing the PR branch, and another regarding the update to the `master` branch. Both are logged in the Polling Log with the same web hook URL, like this:

      Started by event from [IP][web hook URL]

      … and both cause the polling step to poll the same repo(s).  

      B) This problem seems to be exacerbated by the fact that this project uses parallel() to run some jobs in parallel.  Most of those jobs `checkout scm` the same repository to test against, and one checks out an additional repository that includes some more testing tools.

      Despite the fact that most of the jobs run against the same scm, in the Polling Log, we see the poller run 'git ls-remote' on the same repo 7 times.  Once for the main job, and once per each of the 6 parallel jobs. This makes the polling step take 4-5 seconds, which is larger than the delta between steps #2 and #4 above.

      It looks like, in jobs where the polling step is short enough, this isn't an issue.  (Still. Race conditions are bad.)

      Avoiding making multiple `git ls-remote` calls, while not fixing the underlying race condition, may at least make this issue more severe.

      Possible fix?

      It seems like it would be fine to limit polling for a repository to be serial, instead of parallel.  That way when the second poll comes along, it'll find that nothing has changed since the previous one, and not schedule another build?

       

            Unassigned Unassigned
            fcodyc Cody Casterline
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: