Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39498

Missed Events Playback Feature fails at Jenkins restart

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: gerrit-trigger-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.27
      Gerrit Trigger plugin 2.22.0
      Gerrit 2.12.2
      Jenkins and Gerrit on operating system SLES 11.3
    • Similar Issues:

      Description

      Our Jenkins has some 2500 jobs, the majority of them connected to Gerrit projects with Gerrit Trigger plugin.
      When restarting Jenkins, the Jenkins job queue is filled with some 500 jobs, which have already been run before! In other words: the Gerrit Trigger plugin seems to forget that Jenkins has run theses jobs before.
      We experienced the same behavior in the passed when we stopped the Gerrit server in the Jenkins Gerrit Trigger configuration and then restarted it.

        Attachments

          Issue Links

            Activity

            Hide
            tekkamanendless Douglas Manley added a comment -

            I'm seeing something similar.  I have a job that gets triggered when a new patch set is submitted, and it'll all work perfectly fine until it "misses" some.  I'll see these error messages:

            Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager persist
            Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!
            Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            Missed Events Playback used to be NOT supported. now it IS!

            And then subsequent jobs will trigger normally, but the ones that happened at that moment must be manually triggered.

            This message seems to correlate to patch sets being submitted to Gerrit.  I suspect that the plugin receives the event but then has some issue processing it and skips it (and all temporally proximate events) until the "now it IS!" message.

            If I subscribe to the events stream, I don't see any events with a "eventCreatedOn" attribute of null; they all have it set appropriately.

            I haven't been able to pin down where in the source this gets lost; might be the event-log code from "com.sonymobile.tools.gerrit.gerritevents.dto.events.GerritTriggeredEvent"?

            Show
            tekkamanendless Douglas Manley added a comment - I'm seeing something similar.  I have a job that gets triggered when a new patch set is submitted, and it'll all work perfectly fine until it "misses" some.  I'll see these error messages: Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager persist Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled! Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck Missed Events Playback used to be NOT supported. now it IS! And then subsequent jobs will trigger normally, but the ones that happened at that moment must be manually triggered. This message seems to correlate to patch sets being submitted to Gerrit.  I suspect that the plugin receives the event but then has some issue processing it and skips it (and all temporally proximate events) until the "now it IS!" message. If I subscribe to the events stream, I don't see any events with a "eventCreatedOn" attribute of null; they all have it set appropriately. I haven't been able to pin down where in the source this gets lost; might be the event-log code from "com.sonymobile.tools.gerrit.gerritevents.dto.events.GerritTriggeredEvent"?
            Hide
            tekkamanendless Douglas Manley added a comment -

            From what I can tell, a Gerrit event comes in, but the plugin can't establish a connection due to a timeout:

            Jan 19, 2018 3:33:36 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer startConnection
            WARNING: Already started!
            Jan 19, 2018 3:33:46 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer doWakeup
            {{SEVERE: Could not start connection. }}
            java.lang.InterruptedException: time out.

            Shortly after that, "checkIfEventsLogPluginSupported" is called:

            {{ public void checkIfEventsLogPluginSupported() {}}
               GerritServer server = PluginImpl.getServer_(serverName);
               if (server != null && server.getConfig() != null) {
                  isSupported = GerritPluginChecker.isPluginEnabled(
                     server.getConfig(), EVENTS_LOG_PLUGIN_NAME, true);
               }
            {{ }}}

            They key thing here is that "isSupported" is always altered if we have a Gerrit server configured (and we do, because why else would we be here talking about  this?).  And "GerritPluginChecker.isPluginEnabled" will return "false" if there is any kind of error in the connection:

            {{ } catch (IOException e) {}}
               logger.warn(Messages.PluginHttpConnectionGeneralError(pluginName,
                  e.getMessage()), e);
               return false;
            {{ } finally {}}

            I am still trying to verify that those log messages are written; I don't see anything in my logs for "GerritPluginChecker", but it must be getting called because "performCheck" is definitely registering a change in "isSupported", and that only happens via "GerritPluginChecker.isPluginEnabled".

            After that, the timestamps are purged:

            Jan 19, 2018 3:34:16 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            WARNING: Deleting /var/lib/jenkins/gerrit-server-event-data/gerrit.extreme-scale.com/gerrit-trigger-server-timestamps.xml

            And then almost immediately:

            Jan 19, 2018 3:34:35 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            WARNING: Missed Events Playback used to be NOT supported. now it IS!

            At this point, my old timestamps are lost, so the original event that triggered this whole workflow is lost, causing it to never be built.

            What I think that we should do is update "isPluginEnabled" to throw an exception on I/O failure and then not make any changes to "isSupported" when this occurs.  "GerritPluginChecker.isPluginEnabled" is only called in "GerritMissedEventsPlaybackManager.java", so the amount of work would be minimal.  I may submit a pull request for this...

            Show
            tekkamanendless Douglas Manley added a comment - From what I can tell, a Gerrit event comes in, but the plugin can't establish a connection due to a timeout: Jan 19, 2018 3:33:36 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer startConnection WARNING: Already started! Jan 19, 2018 3:33:46 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer doWakeup {{SEVERE: Could not start connection. }} java.lang.InterruptedException: time out. Shortly after that, "checkIfEventsLogPluginSupported" is called: {{ public void checkIfEventsLogPluginSupported() {}}    GerritServer server = PluginImpl.getServer_(serverName);    if (server != null && server.getConfig() != null) {       isSupported = GerritPluginChecker.isPluginEnabled(          server.getConfig(), EVENTS_LOG_PLUGIN_NAME, true);     } {{ }}} They key thing here is that "isSupported" is always  altered if we have a Gerrit server configured (and we do, because why else would we be here talking about  this?).  And "GerritPluginChecker.isPluginEnabled" will return "false" if there is any kind of error in the connection: {{ } catch (IOException e) {}}    logger.warn(Messages.PluginHttpConnectionGeneralError(pluginName,       e.getMessage()), e);    return false; {{ } finally {}} I am still trying to verify that those log messages are written; I don't see anything in my logs for "GerritPluginChecker", but it must be getting called because "performCheck" is definitely registering a change in "isSupported", and that only happens via "GerritPluginChecker.isPluginEnabled". After that, the timestamps are purged: Jan 19, 2018 3:34:16 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck WARNING: Deleting /var/lib/jenkins/gerrit-server-event-data/gerrit.extreme-scale.com/gerrit-trigger-server-timestamps.xml And then almost immediately: Jan 19, 2018 3:34:35 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck WARNING: Missed Events Playback used to be NOT supported. now it IS! At this point, my old timestamps are lost, so the original event that triggered this whole workflow is lost, causing it to never be built. What I think that we should do is update "isPluginEnabled" to throw an exception on I/O failure and then  not  make any changes to "isSupported" when this occurs.  "GerritPluginChecker.isPluginEnabled" is only called in "GerritMissedEventsPlaybackManager.java", so the amount of work would be minimal.  I may submit a pull request for this...
            Hide
            tekkamanendless Douglas Manley added a comment -

            On my Gerrit server at this time, I see:
            {{[2018-01-19 15:26:03,468] [sshd-SshServer[3e89f5bc]-nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:32372])[state=Opened] IOException: Connection reset by peer }}
            {{[2018-01-19 15:26:10,333] [sshd-SshServer[3e89f5bc]-nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:32787])[state=Opened] IOException: Connection reset by peer }}
            [2018-01-19 15:26:11,224] [sshd-SshServer[3e89f5bc]-nio2-thread-1] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:28949])[state=Opened] IOException: Connection reset by peer

            This corresponds to the timeouts that I was seeing in Jenkins.

            Show
            tekkamanendless Douglas Manley added a comment - On my Gerrit server at this time, I see: {{ [2018-01-19 15:26:03,468] [sshd-SshServer [3e89f5bc] -nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:32372] ) [state=Opened] IOException: Connection reset by peer }} {{ [2018-01-19 15:26:10,333] [sshd-SshServer [3e89f5bc] -nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:32787] ) [state=Opened] IOException: Connection reset by peer }} [2018-01-19 15:26:11,224] [sshd-SshServer [3e89f5bc] -nio2-thread-1] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:28949] ) [state=Opened] IOException: Connection reset by peer This corresponds to the timeouts that I was seeing in Jenkins.
            Hide
            tekkamanendless Douglas Manley added a comment -

            Ah okay, there's one more place where "isSupported" is switched to false, and that's in the "persist" method, which attempts to add a Gerrit event to some list.  If the event lacks "eventCreatedOn" (or if it is set to zero), then that function unilaterally disables event logging.

            That seems a bit strange; I would understand if that function merely ignored events that it didn't support (namely those without an "eventCreatedOn" value), but disabling the whole thing based on a single event is a bit overboard.  In addition, there are numerous places where the "checkIfEventsLogPluginSupport" method is called, so there's ample opportunity for the state to be changed normally.

            This whole thing looks like it's been created to avoid some unspeakable disaster as if something terrible were to happen if "persist" were called and the "events-log" plugin wasn't there.  My reading of the code is that it wouldn't really do anything; "persist" wouldn't store the event, and the playback stuff wouldn't have anything to playback.  Most of the important logic around "isSupported" happens when the connection is started, which makes sense.  Changing that value mid-way based on a missing field from a streaming bus doesn't really help with making the plugin any more robust.

            Show
            tekkamanendless Douglas Manley added a comment - Ah okay, there's one more place where "isSupported" is switched to false, and that's in the "persist" method, which attempts to add a Gerrit event to some list.  If the event lacks "eventCreatedOn" (or if it is set to zero), then that function unilaterally disables event logging. That seems a bit strange; I would understand if that function merely ignored events that it didn't support (namely those without an "eventCreatedOn" value), but disabling the whole thing based on a single event is a bit overboard.  In addition, there are numerous places where the "checkIfEventsLogPluginSupport" method is called, so there's ample opportunity for the state to be changed normally. This whole thing looks like it's been created to avoid some unspeakable disaster as if something terrible were to happen if "persist" were called and the "events-log" plugin wasn't there.  My reading of the code is that it wouldn't really do anything; "persist" wouldn't store the event, and the playback stuff wouldn't have anything to playback.  Most of the important logic around "isSupported" happens when the connection is started, which makes sense.  Changing that value mid-way based on a missing field from a streaming bus doesn't really help with making the plugin any more robust.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Robert Sandell
            Path:
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java
            http://jenkins-ci.org/commit/gerrit-trigger-plugin/e48c9b239c62d1c29204af528ae8ea55e24c7f13
            Log:
            Merge pull request #346 from tekkamanendless/master

            JENKINS-39498 Stop panicking about "eventCreatedOn" and losing my position

            Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/f6cdb2fcb3e6...e48c9b239c62

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Robert Sandell Path: src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java http://jenkins-ci.org/commit/gerrit-trigger-plugin/e48c9b239c62d1c29204af528ae8ea55e24c7f13 Log: Merge pull request #346 from tekkamanendless/master JENKINS-39498 Stop panicking about "eventCreatedOn" and losing my position Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/f6cdb2fcb3e6...e48c9b239c62

              People

              • Assignee:
                scoheb Scott Hebert
                Reporter:
                andreas_pelzer Andreas Pelzer
              • Votes:
                7 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: