Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24445

Retriggering builds via the "Manual Trigger" feature of the Gerrit Trigger Plugin causes wrong verification

    Details

    • Similar Issues:

      Description

      There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

      Problem

      This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination.

      This behaviour leads to inconsistent verification results in Gerrit, because it fails to wait for all tests to complete.

      Replication instructions

      1. Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
      2. Install the gerrit-trigger plugin (arbitrary version, as all are affected)
      3. Increase the executor count of the "master" node to at least 6.
      4. Restart the server and create a connection to a Gerrit service.
      5. Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
        • Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-${JOB_NAME}-${BUILD_NUMBER}" in "${JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
      6. (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".

      After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

      This will cause a build of "Test_1" and "Test_2" to be started.
      Now, while they are running (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
      http://[jenkins]/gerrit_manual_trigger/?

      Search for the changeset and issue a retrigger. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

      Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

      Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

      To test this, first delete the two original lock files:

      ${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
      ${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

      This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

      Now, to expose the bug, ONLY delete the lockfile for the re-triggered TEST_2 job:
      ${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

      This will let Test_2 finish successfully (as Test_2 always succeeds).
      The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.

      If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

      This behaviour is absolutely wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to inconsistent results in Gerrit.

      Example output in Gerrit, when error occurs

      Here's an excerpt of how this behaviour looks like in Gerrit:

      ##############################################################################
      Jenkins
      Patch Set 1: -Verified
      Build Started Test_2/1/ (1/2)
      -------------------------------------------------------
      Jenkins
      Patch Set 1:
      Build Started Test_1/1/ (2/2)
      -------------------------------------------------------
      Jenkins
      Patch Set 1:
      Build Started Test_2/2/ (2/2)
      -------------------------------------------------------
      Jenkins
      Patch Set 1:
      Build Started Test_1/2/ (2/2)
      -------------------------------------------------------
      Jenkins
      Patch Set 1: Verified+1
      Build Successful 
      Test_2/2/ : SUCCESS
      -------------------------------------------------------
      Jenkins
      Patch Set 1: Verified-1
      Build Failed 
      Test_1/2/ : FAILURE
      ##############################################################################
      

      You can see two bugs outline above at work here:

      1.) The plugin gets confused when counting the number of jobs started
      2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

      [EDIT: Improved text formatting. Now the report is more than just a wall of text]

        Attachments

          Activity

          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: rinrinne
          Path:
          src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/gerritnotifier/ToGerritRunListener.java
          src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/hudsontrigger/GerritTrigger.java
          src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/hudsontrigger/GerritTriggerTest.java
          src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/spec/SpecGerritTriggerHudsonTest.java
          http://jenkins-ci.org/commit/gerrit-trigger-plugin/407fcae820ea9a3be7c8ed0f5f50545fb30bdff3
          Log:
          Don't trigger builds triggered by the same event

          Now any builds are triggered even if existing builds triggerd by the same
          event are running.

          This patch prevents to trigger build according to the below policy:

          • Project has triggered/running build triggered by the same event.
          • The event trigger builds has still running build.

          This would fix JENKINS-24445.

          Fix for JENKINS-24445

          Task-Url: https://issues.jenkins-ci.org/browse/JENKINS-24445

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: rinrinne Path: src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/gerritnotifier/ToGerritRunListener.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/hudsontrigger/GerritTrigger.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/hudsontrigger/GerritTriggerTest.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/spec/SpecGerritTriggerHudsonTest.java http://jenkins-ci.org/commit/gerrit-trigger-plugin/407fcae820ea9a3be7c8ed0f5f50545fb30bdff3 Log: Don't trigger builds triggered by the same event Now any builds are triggered even if existing builds triggerd by the same event are running. This patch prevents to trigger build according to the below policy: Project has triggered/running build triggered by the same event. The event trigger builds has still running build. This would fix JENKINS-24445 . Fix for JENKINS-24445 Task-Url: https://issues.jenkins-ci.org/browse/JENKINS-24445
          Hide
          mhschroe Martin Schröder added a comment - - edited

          Hi Daniel.

          We originally set it to critical, because we consider corruption-of-data, by approving possibly unverified changes, to be pretty equivalent to loss-of-data.

          I've bumped it down to "major" anyway, per your (and Rin's) recommendation.

          We've simply disallowed the feature completely for all users, so it's not blocking us either.

          Show
          mhschroe Martin Schröder added a comment - - edited Hi Daniel. We originally set it to critical, because we consider corruption-of-data, by approving possibly unverified changes, to be pretty equivalent to loss-of-data. I've bumped it down to "major" anyway, per your (and Rin's) recommendation. We've simply disallowed the feature completely for all users, so it's not blocking us either.
          Hide
          danielbeck Daniel Beck added a comment -

          "Critical" is used to indicate issues resulting in crashes, loss of data, severe memory leaks.

          Show
          danielbeck Daniel Beck added a comment - "Critical" is used to indicate issues resulting in crashes, loss of data, severe memory leaks .
          Hide
          mhschroe Martin Schröder added a comment -

          Hi Rin.

          I would consider it a critical issue, if

          • a feature that is linked-to from the main page's side-panel,
          • and activated by default for most users,
          • is breaking the fundamental assumption, that the Gerrit Trigger will only update the review
            • once all tests submitted for a given patch-sets have passed.

          I am also not sure, if your assumptions about how many builds should be scheduled for events are entirely valid.
          After all, the Gerrit Trigger plugin has another way of retriggering all builds belonging to a given patch-set:

          1. Navigate to any build started by the patch-set in question, once it has finished.
          2. You will see, in the side-panel of that build, a "Retrigger All" button.

          This button does – from the user perspective – the same thing as the "manual trigger" and it works absolutely correctly – because it can only be run, if all builds on that patch-set have finished.
          That's the only reason it works. The manual trigger works, too, if you also wait until all builds for patch-set have finished.
          Both features only break if you let them start concurrently with running tests.

          But the problem is, that the manual trigger allows you to do that while other builds for the same patch-set are running. Since it does not check for this, it must work correctly even when builds are still running.
          Otherwise, this feature is a direct way to break the entire purpose of the plugin.

          And the nastiest thing: It breaks it without feedback. Nothing indicates that this behaviour breaks stuff. There isn't even an easy way to ensure that you can use that feature safely, as it not easy to find out which patch-sets are currently under test.

          Show
          mhschroe Martin Schröder added a comment - Hi Rin. I would consider it a critical issue, if a feature that is linked-to from the main page's side-panel , and activated by default for most users, is breaking the fundamental assumption, that the Gerrit Trigger will only update the review once all tests submitted for a given patch-sets have passed. I am also not sure, if your assumptions about how many builds should be scheduled for events are entirely valid. After all, the Gerrit Trigger plugin has another way of retriggering all builds belonging to a given patch-set: Navigate to any build started by the patch-set in question, once it has finished . You will see, in the side-panel of that build, a "Retrigger All" button. This button does – from the user perspective – the same thing as the "manual trigger" and it works absolutely correctly – because it can only be run, if all builds on that patch-set have finished. That's the only reason it works. The manual trigger works, too, if you also wait until all builds for patch-set have finished. Both features only break if you let them start concurrently with running tests. But the problem is, that the manual trigger allows you to do that while other builds for the same patch-set are running. Since it does not check for this, it must work correctly even when builds are still running. Otherwise, this feature is a direct way to break the entire purpose of the plugin. And the nastiest thing: It breaks it without feedback. Nothing indicates that this behaviour breaks stuff. There isn't even an easy way to ensure that you can use that feature safely, as it not easy to find out which patch-sets are currently under test.
          Hide
          rin_ne rin_ne added a comment - - edited

          This is not critical issue since only caused in rare condition. But it would be critical if once happens.

          From code, It is certain that gerrit-trigger runs under the below assumptions:

          1. A patchset-created event for a patchset in a change is generated only once
          2. Only one build is scheduled from a job triggered by a event.

          The event instance generated by Gerrit event stream is different with the one generated by manual trigger. But object comparison by equals() is true. This fact breaks assumption #1.

          If manual trigger happens while triggered job has running build triggered by Gerrit stream event, a job has 2 builds generated by the same event. This breaks assumption #2.

          So when manual trigger happens, plugin behavior is already unstable. Unfortunately, result would be undefined.

          Perhaps this is design issue. So this could not be fixed easily with no plan.

          Show
          rin_ne rin_ne added a comment - - edited This is not critical issue since only caused in rare condition. But it would be critical if once happens. From code, It is certain that gerrit-trigger runs under the below assumptions: A patchset-created event for a patchset in a change is generated only once Only one build is scheduled from a job triggered by a event. The event instance generated by Gerrit event stream is different with the one generated by manual trigger. But object comparison by equals() is true. This fact breaks assumption #1. If manual trigger happens while triggered job has running build triggered by Gerrit stream event, a job has 2 builds generated by the same event. This breaks assumption #2. So when manual trigger happens, plugin behavior is already unstable. Unfortunately, result would be undefined. Perhaps this is design issue. So this could not be fixed easily with no plan.

            People

            • Assignee:
              rsandell rsandell
              Reporter:
              mhschroe Martin Schröder
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: