Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5393

Jenkins falsely fails builds due to aggressive up-to-date check on junit artifacts

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: junit-plugin
    • Labels:
      None
    • Environment:
      RedHat Linux VMware
    • Similar Issues:

      Description

      I'm getting this error at the end of a build in Hudson that should have been successful. I don't know why this age check is failing but I would be satisfied if I could simply disable the check.

      Archiving artifacts
      Recording test results
      Test reports were found but none of them are new. Did tests run?
      For example, /opt/hudson-workspace/workspace/project/target/test/reports/unittest/Ant suite/Ant test.xml is 3 min 13 sec old

      These files are, without question, up to date and generated only by the current built.

      Here are my questions:
      1: The comparison is between the date of the file and... what other thing? What do "new" and "old" mean in this context?
      2: How can I disable or deceive this comparison? It's completely unwanted and unnecessary.

      I know there was a bug for this in earlier versions of Hudson, but apparently the error message was the only thing changed.

      The problem still occurs in 1.341. As for the environment, these are RedHat Linux virtual machines, and the dates on the system are synchronized to within 1 second. I cannot understand why Hudson thinks these files are too old.

        Attachments

          Issue Links

            Activity

            Hide
            mindless Alan Harder added a comment -

            Looking at the code, the comparison appears to be: is the timestamp of all test files more than 3 seconds before this build was started? (with some attempt to account for master/slave clock difference)

            Is your build running on a slave or the master?
            How long does your build take to run?

            Show
            mindless Alan Harder added a comment - Looking at the code, the comparison appears to be: is the timestamp of all test files more than 3 seconds before this build was started? (with some attempt to account for master/slave clock difference) Is your build running on a slave or the master? How long does your build take to run?
            Hide
            carltonb carltonb added a comment - - edited

            The build was running on an ssh slave. The build takes 3 to 5 minutes to run depending on activity on the box. Also, there is no time difference between the master and slave as tested by running the unix 'date' command in 2 separate terminal windows.

            I suspect the problem is the attempt to account for the master/slave clock difference. It seems to me that it's just asking for trouble to compare remote file timestamps to a clock on another machine.

            Could you point me to the code where you found this behavior? I would like to take a look and understand this a little better.

            Show
            carltonb carltonb added a comment - - edited The build was running on an ssh slave. The build takes 3 to 5 minutes to run depending on activity on the box. Also, there is no time difference between the master and slave as tested by running the unix 'date' command in 2 separate terminal windows. I suspect the problem is the attempt to account for the master/slave clock difference. It seems to me that it's just asking for trouble to compare remote file timestamps to a clock on another machine. Could you point me to the code where you found this behavior? I would like to take a look and understand this a little better.
            Hide
            carltonb carltonb added a comment -

            Just a sample of the time differentials:
            Build time: 4 minutes 22 seconds Ant test.xml is 2 min 31 sec old
            Build time: 3 minutes 38 seconds Ant test.xml is 3 min 13 sec old
            Build time: 5 minutes 38 seconds Ant test.xml is 1 min 26 sec old

            I could see maybe 10-20 seconds being too close, but this master/slave has never been 4 minutes apart.

            Show
            carltonb carltonb added a comment - Just a sample of the time differentials: Build time: 4 minutes 22 seconds Ant test.xml is 2 min 31 sec old Build time: 3 minutes 38 seconds Ant test.xml is 3 min 13 sec old Build time: 5 minutes 38 seconds Ant test.xml is 1 min 26 sec old I could see maybe 10-20 seconds being too close, but this master/slave has never been 4 minutes apart.
            Hide
            carltonb carltonb added a comment -

            After digging around in Fisheye I see where this is happening.

            http://fisheye.jenkins-ci.org/browse/Hudson/trunk/hudson/main/core/src/main/java/hudson/tasks/junit/TestResult.java?r=HEAD

            It looks like some refactoring has taken place, and there is a TODO to make a certain property 'checkTimestamps' configurable as a system property.

            I presume that the person who made that comment delayed this change for a reason and will finish that task when the time is right.

            Show
            carltonb carltonb added a comment - After digging around in Fisheye I see where this is happening. http://fisheye.jenkins-ci.org/browse/Hudson/trunk/hudson/main/core/src/main/java/hudson/tasks/junit/TestResult.java?r=HEAD It looks like some refactoring has taken place, and there is a TODO to make a certain property 'checkTimestamps' configurable as a system property. I presume that the person who made that comment delayed this change for a reason and will finish that task when the time is right.
            Hide
            carltonb carltonb added a comment -

            I finally discovered the reason for the apparent clock difference. It was not that the slave and master were out of sync. Rather, the slave and master both store all their data on an NFS mount, and the NFS server was 6 minutes behind. Fortunately this is an NFS server that my team controls, so we were able to correct the clock and pass builds.

            IMO this is just another illustration of why we need the option to disable this aggressive config check. There are just too many ways it can falsely fail for reasons that the Hudson admin may not be able to mitigate.

            Show
            carltonb carltonb added a comment - I finally discovered the reason for the apparent clock difference. It was not that the slave and master were out of sync. Rather, the slave and master both store all their data on an NFS mount, and the NFS server was 6 minutes behind. Fortunately this is an NFS server that my team controls, so we were able to correct the clock and pass builds. IMO this is just another illustration of why we need the option to disable this aggressive config check. There are just too many ways it can falsely fail for reasons that the Hudson admin may not be able to mitigate.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Note that if this check is not in place, tests that are removed at one point will continue to be tabulated by Hudson, until you do a clean build.

            Maybe touching a file in the beginning of the build and reading its timestamp back to use as the basis might be the way to go.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Note that if this check is not in place, tests that are removed at one point will continue to be tabulated by Hudson, until you do a clean build. Maybe touching a file in the beginning of the build and reading its timestamp back to use as the basis might be the way to go.
            Hide
            carltonb carltonb added a comment -

            That sounds like a good plan, as long as the reference file is created in the same way the unit test files are created (on the same slave, in the same workspace) to avoid unnecessary differences.

            Show
            carltonb carltonb added a comment - That sounds like a good plan, as long as the reference file is created in the same way the unit test files are created (on the same slave, in the same workspace) to avoid unnecessary differences.
            Hide
            ebann ebann added a comment -

            Related to 6268.

            Aggressive time check should be an option.
            In my case Junit xml files are retrieved in other jobs artifacts, i just want to regroup and parse all tests results in a single job.

            Show
            ebann ebann added a comment - Related to 6268. Aggressive time check should be an option. In my case Junit xml files are retrieved in other jobs artifacts, i just want to regroup and parse all tests results in a single job.
            Hide
            kutzi kutzi added a comment -

            Regarding Kohsuke's comment from 15/Jun/10: I'd say that not doing a clean build and relying on some 'magic' timestamp comparison to calculate the test results, is asking for errors.

            I agree with others that this feature should be configurable - and ideally off by default, but that's probably not possible for backwards compatibility.

            Show
            kutzi kutzi added a comment - Regarding Kohsuke's comment from 15/Jun/10: I'd say that not doing a clean build and relying on some 'magic' timestamp comparison to calculate the test results, is asking for errors. I agree with others that this feature should be configurable - and ideally off by default, but that's probably not possible for backwards compatibility.
            Hide
            nirro Stanislav Tyurikov added a comment -

            This is a stopper for some cases of build execution time optimization. I even can't imagine the reasons why this feature exists.

            Show
            nirro Stanislav Tyurikov added a comment - This is a stopper for some cases of build execution time optimization. I even can't imagine the reasons why this feature exists.
            Hide
            revoltn Kent Bolton added a comment -

            Looks like the magic number is 1000 miliseconds. Code below in a patch here: http://git.eclipse.org/c/hudson/org.eclipse.hudson.core.git/commit?id=97587dde35e862fabbd8b5b4b0136a786cfcff37 .

            This is an issue when using VMs where time can drift quite quickly. Would be good to disable this test, or to be able to change the magic number of seconds in configuration.

            @@ -151,21 +144,22 @@ public final class TestResult extends MetaTabulatedResult {
            }
            }

            • if(!parsed) {
              + if (!parsed) {
              long localTime = System.currentTimeMillis();
            • if(localTime < buildTime-1000) /margin/
            • // build time is in the the future. clock on this slave must be running behind
              + if (localTime < buildTime - 1000) /margin/ // build time is in the the future. clock on this slave must be running behind
              + { throw new AbortException( - "Clock on this slave is out of sync with the master, and therefore \n" + - "I can't figure out what test results are new and what are old.\n" + - "Please keep the slave clock in sync with the master."); + "Clock on this slave is out of sync with the master, and therefore \n" + + "I can't figure out what test results are new and what are old.\n" + + "Please keep the slave clock in sync with the master."); + }
            Show
            revoltn Kent Bolton added a comment - Looks like the magic number is 1000 miliseconds. Code below in a patch here: http://git.eclipse.org/c/hudson/org.eclipse.hudson.core.git/commit?id=97587dde35e862fabbd8b5b4b0136a786cfcff37 . This is an issue when using VMs where time can drift quite quickly. Would be good to disable this test, or to be able to change the magic number of seconds in configuration. @@ -151,21 +144,22 @@ public final class TestResult extends MetaTabulatedResult { } } if(!parsed) { + if (!parsed) { long localTime = System.currentTimeMillis(); if(localTime < buildTime-1000) / margin / // build time is in the the future. clock on this slave must be running behind + if (localTime < buildTime - 1000) / margin / // build time is in the the future. clock on this slave must be running behind + { throw new AbortException( - "Clock on this slave is out of sync with the master, and therefore \n" + - "I can't figure out what test results are new and what are old.\n" + - "Please keep the slave clock in sync with the master."); + "Clock on this slave is out of sync with the master, and therefore \n" + + "I can't figure out what test results are new and what are old.\n" + + "Please keep the slave clock in sync with the master."); + }
            Hide
            carlosgarcia Carlos Garcia added a comment -

            I have a similar problem. Even though my clocks are in sync (agents and master), Jenkins complains about them being out of sync.

            I agree with you, making this "magic number" of 1000 configurable, or even disabling this unnecessary check, is necessary. Given that this is a 9-year-old bug I don't expect this to be fixed; I'll try to modify this behaviour myself and build Jenkins.

            Show
            carlosgarcia Carlos Garcia added a comment - I have a similar problem. Even though my clocks are in sync (agents and master), Jenkins complains about them being out of sync. I agree with you, making this "magic number" of 1000 configurable, or even disabling this unnecessary check, is necessary. Given that this is a 9-year-old bug I don't expect this to be fixed; I'll try to modify this behaviour myself and build Jenkins.

              People

              • Assignee:
                Unassigned
                Reporter:
                carltonb carltonb
              • Votes:
                14 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated: