Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-17125

FingerprintAction deserialization leads to NPE

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      FingerprintAction has the following field:

      private final AbstractBuild build;

      This field links to an AbstractBuild, which in turn has the following field:

      protected transient final JobT project;

      Once Jenkin persists a run to a file, the build from the FingerprintAction will be saved, but the project itself will not get saved. An example fragment from one of the build files looks like this:

          <hudson.tasks.Fingerprinter_-FingerprintAction>
            <build class="build">
              <actions>
                <hudson.model.ParametersAction reference="../../../../hudson.model.ParametersAction"/>
                <hudson.model.CauseAction reference="../../../../hudson.model.CauseAction"/>
                <hudson.plugins.copyartifact.CopyArtifact_-EnvAction reference="../../../../hudson.plugins.copyartifact.CopyArtifact_-EnvAction"/>
                <hudson.tasks.Fingerprinter_-FingerprintAction reference="../../.."/>
              </actions>
              <number>17</number>
              <startTime>1358250655935</startTime>
              <result>SUCCESS</result>
              <duration>4488</duration>
              <charset>UTF-8</charset>
              <keepLog>false</keepLog>
              <builtOn></builtOn>
              <workspace>/opt/hudson/files/jobs/Deploy/workspace</workspace>
              <hudsonVersion>1.494</hudsonVersion>
              <scm class="hudson.scm.NullChangeLogParser"/>
              <culprits class="com.google.common.collect.EmptyImmutableSortedSet"/>
            </build>
            ...
      

      When the files are read and the objects are created, the project field remains empty. During intialization, the onLoad() method is called:

          @Override
          protected R retrieve(File d) throws IOException {
              if(new File(d,"build.xml").exists()) {
                  // if the build result file isn't in the directory, ignore it.
                  try {
                      R b = cons.create(d);
                      b.onLoad();
                      if (LOGGER.isLoggable(FINE))
                          LOGGER.log(FINE,"Loaded " + b.getFullDisplayName(),new ThisIsHowItsLoaded());
                      return b;
                  } catch (IOException e) {
                      LOGGER.log(Level.WARNING, "could not load " + d, e);
                  } catch (InstantiationError e) {
                      LOGGER.log(Level.WARNING, "could not load " + d, e);
                  }
              }
              return null;
          }
      

      This method invokes onLoad() on all actions that implement RunAction:

              for (Action a : getActions())
                  if (a instanceof RunAction)
                      ((RunAction) a).onLoad();
      

      The FingerprintAction does implement the RunAction interface, and the method is implemented as follows:

              public void onLoad() {
                  // share data structure with nearby builds, but to keep lazy loading efficient,
                  // don't go back the history forever.
                  if (rand.nextInt(2)!=0) {
                      Run pb = build.getPreviousBuild();
                      if (pb!=null) {
                          FingerprintAction a = pb.getAction(FingerprintAction.class);
                          if (a!=null)
                              compact(a);
                      }
                  }
              }
      

      Build is set, so getPreviousBuild() can be called, this will however fail because the project is null (due to the transient field) and will throw a NullPointerException.

      This causes very strange behavior as sometimes pages are working, sometimes they are not. There are a lot of stack traces around that actually look like as if they were caused by the same problem, for example: https://issues.jenkins-ci.org/browse/JENKINS-16845

      Stack Trace looks like this:

      Caused by: java.lang.NullPointerException
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:207)
      	at hudson.tasks.Fingerprinter$FingerprintAction.onLoad(Fingerprinter.java:349)
      	at hudson.model.Run.onLoad(Run.java:315)
      	at hudson.model.RunMap.retrieve(RunMap.java:221)
      	at hudson.model.RunMap.retrieve(RunMap.java:59)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:638)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:601)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:344)
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:207)
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:100)
      	at hudson.model.RunMap$1.next(RunMap.java:107)
      	at hudson.model.RunMap$1.next(RunMap.java:96)
      	at hudson.widgets.HistoryWidget.getRenderList(HistoryWidget.java:133)
      	... 122 more
      

        Attachments

          Issue Links

            Activity

            Hide
            homes2001 Dominik Bieringer added a comment - - edited

            The way I solved it for now is the following:

            diff --git a/core/src/main/java/hudson/tasks/Fingerprinter.java b/core/src/main/java/hudson/tasks/Fingerprinter.java
            index 6600bd4..d22ade7 100644
            --- a/core/src/main/java/hudson/tasks/Fingerprinter.java
            +++ b/core/src/main/java/hudson/tasks/Fingerprinter.java
            @@ -294,7 +294,7 @@
                  */
                 public static final class FingerprintAction implements RunAction {
                     
            -        private final AbstractBuild build;
            +        private AbstractBuild build;
             
                     private static final Random rand = new Random();
             
            @@ -356,6 +356,7 @@
                     }
             
                     public void onAttached(Run r) {
            +        	this.build = (AbstractBuild)r;
                     }
             
                     public void onBuildComplete() {
            
            diff --git a/core/src/main/java/hudson/model/RunMap.java b/core/src/main/java/hudson/model/RunMap.java
            index 9207cfc..fad5b9d 100644
            --- a/core/src/main/java/hudson/model/RunMap.java
            +++ b/core/src/main/java/hudson/model/RunMap.java
            @@ -218,6 +218,10 @@
                         // if the build result file isn't in the directory, ignore it.
                         try {
                             R b = cons.create(d);
            +                for (Action cur : b.getActions())
            +                	if (cur instanceof RunAction)
            +                		((RunAction) cur).onAttached(b);
            +                
                             b.onLoad();
                             if (LOGGER.isLoggable(FINE))
                                 LOGGER.log(FINE,"Loaded " + b.getFullDisplayName(),new ThisIsHowItsLoaded());
            
            Show
            homes2001 Dominik Bieringer added a comment - - edited The way I solved it for now is the following: diff --git a/core/src/main/java/hudson/tasks/Fingerprinter.java b/core/src/main/java/hudson/tasks/Fingerprinter.java index 6600bd4..d22ade7 100644 --- a/core/src/main/java/hudson/tasks/Fingerprinter.java +++ b/core/src/main/java/hudson/tasks/Fingerprinter.java @@ -294,7 +294,7 @@ */ public static final class FingerprintAction implements RunAction { - private final AbstractBuild build; + private AbstractBuild build; private static final Random rand = new Random(); @@ -356,6 +356,7 @@ } public void onAttached(Run r) { + this .build = (AbstractBuild)r; } public void onBuildComplete() { diff --git a/core/src/main/java/hudson/model/RunMap.java b/core/src/main/java/hudson/model/RunMap.java index 9207cfc..fad5b9d 100644 --- a/core/src/main/java/hudson/model/RunMap.java +++ b/core/src/main/java/hudson/model/RunMap.java @@ -218,6 +218,10 @@ // if the build result file isn't in the directory, ignore it. try { R b = cons.create(d); + for (Action cur : b.getActions()) + if (cur instanceof RunAction) + ((RunAction) cur).onAttached(b); + b.onLoad(); if (LOGGER.isLoggable(FINE)) LOGGER.log(FINE, "Loaded " + b.getFullDisplayName(), new ThisIsHowItsLoaded());
            Hide
            kutzi kutzi added a comment -

            Not related to the fingerprint plugin but to core.

            Show
            kutzi kutzi added a comment - Not related to the fingerprint plugin but to core.
            Hide
            josesa Jose Sa added a comment -

            In 1.502 some jobs fail to load due to the NullPointer exception in getPreviousBuild.

            Show
            josesa Jose Sa added a comment - In 1.502 some jobs fail to load due to the NullPointer exception in getPreviousBuild.
            Hide
            oldelvet Richard Mortimer added a comment -

            I suspect that this is an instance of JENKINS-15156. One of the issues with that is that AbstractProject#builds was not being properly re-initialized by the lazy-loading code.

            JENKINS-15156 is mostly fixed in 1.504 but there are still a few fixes that will be in the soon to be released 1.505.

            Show
            oldelvet Richard Mortimer added a comment - I suspect that this is an instance of JENKINS-15156 . One of the issues with that is that AbstractProject#builds was not being properly re-initialized by the lazy-loading code. JENKINS-15156 is mostly fixed in 1.504 but there are still a few fixes that will be in the soon to be released 1.505.
            Hide
            jglick Jesse Glick added a comment -

            JENKINS-16845 filed first so tracking there, though this issue has a far better explanation.

            Show
            jglick Jesse Glick added a comment - JENKINS-16845 filed first so tracking there, though this issue has a far better explanation.
            Hide
            jglick Jesse Glick added a comment -

            The patch seems wrong. onLoad, not onAttached, should be called when loading existing builds. Unfortunately onLoad does not pass the Run as a parameter; it is documented that the Run may be kept in a field.

            What I am wondering is how you got a build.xml like this to begin with. I do not get this format when I create a new job with fingerprints. I get

            <hudson.tasks.Fingerprinter_-FingerprintAction>
              <build class="build" reference="../../.."/>
              <record>
                <entry>
                  <string>x.txt</string>
                  <string>c614bcfdc7abb8eae48c5c820cd1f274</string>
                </entry>
              </record>
            </hudson.tasks.Fingerprinter_-FingerprintAction>
            

            as I would expect.

            Show
            jglick Jesse Glick added a comment - The patch seems wrong. onLoad , not onAttached , should be called when loading existing builds. Unfortunately onLoad does not pass the Run as a parameter; it is documented that the Run may be kept in a field. What I am wondering is how you got a build.xml like this to begin with. I do not get this format when I create a new job with fingerprints. I get <hudson.tasks.Fingerprinter_-FingerprintAction> <build class= "build" reference= "../../.." /> <record> <entry> <string> x.txt </string> <string> c614bcfdc7abb8eae48c5c820cd1f274 </string> </entry> </record> </hudson.tasks.Fingerprinter_-FingerprintAction> as I would expect.
            Hide
            jglick Jesse Glick added a comment -

            …and this is true even when I run on 1.494 as you seem to have done.

            Show
            jglick Jesse Glick added a comment - …and this is true even when I run on 1.494 as you seem to have done.
            Hide
            homes2001 Dominik Bieringer added a comment -

            We got or have XML files (build.xml) like the one you showed very often, however, there is a small number of build.xml files that looks like the one that I described. Isn't that related to how the whole memory object tree get's serialized to the file and also depends on whether the same objects are used or not to represent the builds?

            However, after we fixed the code the way I described, we never faced the issue again and Jenkins is working like charm now. We had the following issues before:

            • The build list on the left side when you go to a project was not showing correctly (sometimes it was empty, othertimes it was full, but when clicking "more builds", it got empty and did not load correctly)
            • The mails (success mails, failure mails, etc... for builds) were not sent correctly. We got the mentioned exception during the build.
            • The build pipeline plugin was not showing correctly and we saw the same exception in the log files.
            Show
            homes2001 Dominik Bieringer added a comment - We got or have XML files (build.xml) like the one you showed very often, however, there is a small number of build.xml files that looks like the one that I described. Isn't that related to how the whole memory object tree get's serialized to the file and also depends on whether the same objects are used or not to represent the builds? However, after we fixed the code the way I described, we never faced the issue again and Jenkins is working like charm now. We had the following issues before: The build list on the left side when you go to a project was not showing correctly (sometimes it was empty, othertimes it was full, but when clicking "more builds", it got empty and did not load correctly) The mails (success mails, failure mails, etc... for builds) were not sent correctly. We got the mentioned exception during the build. The build pipeline plugin was not showing correctly and we saw the same exception in the log files.
            Hide
            jglick Jesse Glick added a comment -

            Right, I think the busted serialization format is probably the result of a BuildReference being cleared and a different AbstractBuild being created than the FingerprintAction was originally produced with. So there are two problems:

            1. Making sure that does not happen.
            2. Recovering gracefully from the broken format.

            For the second part, your proposed patch is unacceptable (I think) because it breaks the general RunAction contract. I am working on a different patch that acknowledges that the FingerprintAction is incomplete and just limits the damage (no exceptions printed, most things with the fingerprints work).

            Not sure what to do about the first part. Seems like a fundamental flaw. (Though I am unable to reproduce it even by making AbstractLazyLoadRunMap.unwrap usually return null.) The best I can think of is to cache all the RunAction instances in a build using strong references so it is guaranteed they will not be collected.

            Show
            jglick Jesse Glick added a comment - Right, I think the busted serialization format is probably the result of a BuildReference being cleared and a different AbstractBuild being created than the FingerprintAction was originally produced with. So there are two problems: Making sure that does not happen. Recovering gracefully from the broken format. For the second part, your proposed patch is unacceptable (I think) because it breaks the general RunAction contract. I am working on a different patch that acknowledges that the FingerprintAction is incomplete and just limits the damage (no exceptions printed, most things with the fingerprints work). Not sure what to do about the first part. Seems like a fundamental flaw. (Though I am unable to reproduce it even by making AbstractLazyLoadRunMap.unwrap usually return null.) The best I can think of is to cache all the RunAction instances in a build using strong references so it is guaranteed they will not be collected.
            Hide
            jglick Jesse Glick added a comment -

            Another fix in FingerprintAction would be to make the current build field transient and introduce another field based on Run.getExternalizableId and .fromExternalizableId, which would avoid both XStream object graph trickery and problems with collection of build records. But this would be a change in the documented recommendations for persistence of a RunAction so I think Kohsuke should think about this.

            Show
            jglick Jesse Glick added a comment - Another fix in FingerprintAction would be to make the current build field transient and introduce another field based on Run.getExternalizableId and .fromExternalizableId , which would avoid both XStream object graph trickery and problems with collection of build records. But this would be a change in the documented recommendations for persistence of a RunAction so I think Kohsuke should think about this.
            Hide
            jglick Jesse Glick added a comment -

            Better description in this issue than in JENKINS-16845.

            Show
            jglick Jesse Glick added a comment - Better description in this issue than in JENKINS-16845 .
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/test/java/hudson/tasks/FingerprinterTest.java
            test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip
            http://jenkins-ci.org/commit/jenkins/bd709b0631329f1abaf05de4a3499562a1606691
            Log:
            JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/hudson/tasks/FingerprinterTest.java test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip http://jenkins-ci.org/commit/jenkins/bd709b0631329f1abaf05de4a3499562a1606691 Log: JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            changelog.html
            core/src/main/java/hudson/model/CauseAction.java
            core/src/main/java/hudson/model/Run.java
            core/src/main/java/hudson/model/RunAction.java
            core/src/main/java/hudson/tasks/Fingerprinter.java
            core/src/main/java/jenkins/model/RunAction2.java
            core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly
            http://jenkins-ci.org/commit/jenkins/a614cd5b05b3c8cbcb8970ea439b2a1315252f58
            Log:
            [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/CauseAction.java core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/model/RunAction.java core/src/main/java/hudson/tasks/Fingerprinter.java core/src/main/java/jenkins/model/RunAction2.java core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly http://jenkins-ci.org/commit/jenkins/a614cd5b05b3c8cbcb8970ea439b2a1315252f58 Log: [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2.
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #2575
            JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed. (Revision bd709b0631329f1abaf05de4a3499562a1606691)
            [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2. (Revision a614cd5b05b3c8cbcb8970ea439b2a1315252f58)

            Result = SUCCESS
            Jesse Glick : bd709b0631329f1abaf05de4a3499562a1606691
            Files :

            • test/src/test/java/hudson/tasks/FingerprinterTest.java
            • test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip

            Jesse Glick : a614cd5b05b3c8cbcb8970ea439b2a1315252f58
            Files :

            • core/src/main/java/hudson/model/RunAction.java
            • core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly
            • core/src/main/java/hudson/model/Run.java
            • changelog.html
            • core/src/main/java/hudson/tasks/Fingerprinter.java
            • core/src/main/java/jenkins/model/RunAction2.java
            • core/src/main/java/hudson/model/CauseAction.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2575 JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed. (Revision bd709b0631329f1abaf05de4a3499562a1606691) [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2. (Revision a614cd5b05b3c8cbcb8970ea439b2a1315252f58) Result = SUCCESS Jesse Glick : bd709b0631329f1abaf05de4a3499562a1606691 Files : test/src/test/java/hudson/tasks/FingerprinterTest.java test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip Jesse Glick : a614cd5b05b3c8cbcb8970ea439b2a1315252f58 Files : core/src/main/java/hudson/model/RunAction.java core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly core/src/main/java/hudson/model/Run.java changelog.html core/src/main/java/hudson/tasks/Fingerprinter.java core/src/main/java/jenkins/model/RunAction2.java core/src/main/java/hudson/model/CauseAction.java
            Hide
            hx_unbanned Linards L added a comment -

            Fix Targeting 1.519 ?

            Show
            hx_unbanned Linards L added a comment - Fix Targeting 1.519 ?
            Hide
            jglick Jesse Glick added a comment -

            Yes, should be in 1.519 I think. Note that this “true” fix changes the serial format of builds going forward: builds created in 1.519+ will not load well in 1.518-, but builds created in any version (even those suffering from the rare condition that this issue reports) should be loaded cleanly in 1.519+ (and their format upgraded if you somehow edit the build, e.g. setting a description).

            Show
            jglick Jesse Glick added a comment - Yes, should be in 1.519 I think. Note that this “true” fix changes the serial format of builds going forward: builds created in 1.519+ will not load well in 1.518-, but builds created in any version (even those suffering from the rare condition that this issue reports) should be loaded cleanly in 1.519+ (and their format upgraded if you somehow edit the build, e.g. setting a description).
            Hide
            jglick Jesse Glick added a comment -

            Compare https://github.com/jenkinsci/jobConfigHistory-plugin/pull/17 and probably others need to be fixed too: hudson.tasks.junit.TestResultAction, org.jenkinsci.plugins.envinject.EnvInjectPluginAction, etc.

            Show
            jglick Jesse Glick added a comment - Compare https://github.com/jenkinsci/jobConfigHistory-plugin/pull/17 and probably others need to be fixed too: hudson.tasks.junit.TestResultAction , org.jenkinsci.plugins.envinject.EnvInjectPluginAction , etc.

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                homes2001 Dominik Bieringer
              • Votes:
                2 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: