Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31660

StackOverflowError when maximum number of builds archived

    Details

    • Similar Issues:

      Description

      We've been seeing a StackOverflowError with Test stability enabled and Discard Old Builds:

      FATAL: null
      java.lang.StackOverflowError
      	at hudson.tasks.junit.TestResultAction.load(TestResultAction.java:197)
      	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:143)
      	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:62)
      	at hudson.tasks.test.AbstractTestResultAction.findCorrespondingResult(AbstractTestResultAction.java:247)
      	at hudson.tasks.test.TestResult.getPreviousResult(TestResult.java:142)
      	at hudson.tasks.junit.SuiteResult.getPreviousResult(SuiteResult.java:283)
      	at hudson.tasks.junit.CaseResult.getPreviousResult(CaseResult.java:446)
      	at hudson.tasks.junit.CaseResult.freeze(CaseResult.java:575)
      	at hudson.tasks.junit.SuiteResult.freeze(SuiteResult.java:325)
      	at hudson.tasks.junit.TestResult.freeze(TestResult.java:627)
      	at hudson.tasks.junit.TestResultAction.load(TestResultAction.java:200)
      	at hudson.tasks.junit.TestResultAction.getResult(TestResultAction.java:143)
              ... repeated ...
      

      Disabling the test stability resolves our issue.

        Attachments

        1. console.log
          89 kB
        2. consoleText
          79 kB
        3. jenkins_stack_trace.txt
          75 kB
        4. jenkins.log
          74 kB
        5. jenkins-full-exception.log
          85 kB
        6. jenkins-SOE-20170426.log
          24 kB

          Issue Links

            Activity

            Hide
            davehunt Dave Hunt added a comment -

            Just seen this again after an upgrade to Jenkins 2.7.1. The first build failure after the upgrade was reported as expected, however the next build passed but hit this stack overflow. Disabling Test Stability History in the configuration allowed the build to pass without this exception.

            I've attached the full console log including the exception: console.log

            Show
            davehunt Dave Hunt added a comment - Just seen this again after an upgrade to Jenkins 2.7.1. The first build failure after the upgrade was reported as expected, however the next build passed but hit this stack overflow. Disabling Test Stability History in the configuration allowed the build to pass without this exception. I've attached the full console log including the exception: console.log
            Hide
            stefanthurnherr Stefan Thurnherr added a comment - - edited

            Getting same SOE with Jenkins v2.55 and junit-plugin v1.20 and test-stability-plugin not installed. attaching the full stacktrace from the jenkins.log. jenkins-SOE-20170426.log

            The build configuration (controlled by Jenkinsfile build properties) does not discard any old builds, so we have all ca. 1500 builds (oldest is from 2017-02-23) still left inside Jenkins.

            Since it is a multi-branch build pipeline, we have other branches with much shorter build history. And they build without any problems, which further confirms the guess from previous comments that it is related to traversing the build history.

            Update: Configuring the build to discard builds older than 1 months has solved the problem in our case.

            Show
            stefanthurnherr Stefan Thurnherr added a comment - - edited Getting same SOE with Jenkins v2.55 and junit-plugin v1.20 and test-stability-plugin not installed. attaching the full stacktrace from the jenkins.log. jenkins-SOE-20170426.log The build configuration (controlled by Jenkinsfile build properties) does not discard any old builds, so we have all ca. 1500 builds (oldest is from 2017-02-23) still left inside Jenkins. Since it is a multi-branch build pipeline, we have other branches with much shorter build history. And they build without any problems, which further confirms the guess from previous comments that it is related to traversing the build history. Update : Configuring the build to discard builds older than 1 months has solved the problem in our case.
            Hide
            seanf Sean Flanigan added a comment -

            The stack traces show that AbstractTestResultAction.findCorrespondingResult() indirectly calls itself recursively, and in these cases that recursion has caused a StackOverflowError.

            From Stefan Thurnherr's description, this sounds like it may have a similar cause to https://issues.jenkins-ci.org/browse/JENKINS-33168?focusedCommentId=285979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-285979 - where StabilityTestDataPublisher.buildUpInitialHistory() iterates through many failing tests, getting their previous results across multiple builds. When memory pressure is high (eg lots of build history), CaseResult.getPreviousResult() can't use its WeakReference cache and has to load results from disk, thus loading each build's results many times instead of once. And now it's apparent that this involves calling AbstractTestResultAction.findCorrespondingResult() recursively for every previous build.

            So there are two related problems here:

            1. When loading results with lots of history, the recursion in findCorrespondingResult() causes a StackOverflowError unless (a) previous results were found in the WeakReference cache or (b) the number of previous results fits in the stack. (Will Harris's binary search from the earliest build mitigated this by preloading a limited number of results into the WeakReference.)

            2. Test Stability Plugin calls findCorrespondingResult() a lot when building initial history for a failing test. This produces a lot of memory pressure when there is a lot of build history, thus defeating the caching in 1(a) above. (In JENKINS-33168 the number of builds apparently hasn't been high enough to overflow the stack, but the number of test results is too much for the cache, thus killing performance.)

            So increasing stack size should certainly work around the StackOverflowError unless the number of builds gets too high, but if you use Test Stability Plugin you will probably encounter JENKINS-33168 if you have a lot of builds with a lot of tests in them.

            I think AbstractTestResultAction.findCorrespondingResult() in the JUnit plugin (or something else in that recursive call stack) needs to be redesigned to avoid recursion, otherwise a StackOverflowError is unavoidable when there are a lot of previous builds. (Solving JENKINS-33168, on the other hand, will require iterating in such a way that each build's results are only loaded once.)

            Show
            seanf Sean Flanigan added a comment - The stack traces show that  AbstractTestResultAction.findCorrespondingResult() indirectly calls itself recursively, and in these cases that recursion has caused a StackOverflowError . From Stefan Thurnherr 's description, this sounds like it may have a similar cause to https://issues.jenkins-ci.org/browse/JENKINS-33168?focusedCommentId=285979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-285979  - where StabilityTestDataPublisher.buildUpInitialHistory() iterates through many failing tests, getting their previous results across multiple builds. When memory pressure is high (eg lots of build history), CaseResult.getPreviousResult() can't use its WeakReference cache and has to load results from disk, thus loading each build's results many times instead of once. And now it's apparent that this involves calling  AbstractTestResultAction.findCorrespondingResult() recursively for every previous build. So there are two related problems here: 1. When loading results with lots of history, the recursion in  findCorrespondingResult() causes a StackOverflowError unless (a) previous results were found in the WeakReference cache or (b) the number of previous results fits in the stack. ( Will Harris 's binary search from the earliest build mitigated this by preloading a limited number of results into the WeakReference.) 2. Test Stability Plugin calls findCorrespondingResult() a lot when building initial history for a failing test. This produces a lot of memory pressure when there is a lot of build history, thus defeating the caching in 1(a) above. (In JENKINS-33168 the number of builds apparently hasn't been high enough to overflow the stack, but the number of test results is too much for the cache, thus killing performance.) So increasing stack size should certainly work around the StackOverflowError unless the number of builds gets too high, but if you use Test Stability Plugin you will probably encounter JENKINS-33168 if you have a lot of builds with a lot of tests in them. I think AbstractTestResultAction.findCorrespondingResult() in the JUnit plugin (or something else in that recursive call stack) needs to be redesigned to avoid recursion, otherwise a StackOverflowError is unavoidable when there are a lot of previous builds. (Solving JENKINS-33168 , on the other hand, will require iterating in such a way that each build's results are only loaded once.)
            Hide
            zbynek Zbynek Konecny added a comment -

            This happens often when there are skipped tests because of a bug in the Junit plugin, see https://github.com/jenkinsci/junit-plugin/pull/117
            It's independent on Build Stability plugin.

            Show
            zbynek Zbynek Konecny added a comment - This happens often when there are skipped tests because of a bug in the Junit plugin, see https://github.com/jenkinsci/junit-plugin/pull/117 It's independent on Build Stability plugin.
            Hide
            fillermark filler mark added a comment -

            A StackOverflowError is simply signals that there is no more memory available. It is to the stack what an OutOfMemoryError is to the heap: it simply signals that there is no more memory available. JVM has a given memory allocated for each stack of each thread, and if an attempt to call a method happens to fill this memory, JVM throws an error. Just like it would do if you were trying to write at index N of an array of length N. No memory corruption can happen. The stack can not write into the heap.

            The common cause for a stackoverflow is a bad recursive call. Typically, this is caused when your recursive functions doesn't have the correct termination condition, so it ends up calling itself forever. Or when the termination condition is fine, it can be caused by requiring too many recursive calls before fulfilling it.

            Here's an example:

            public class Overflow {
            public static final void main(String[] args)

            { main(args); }

            }
            That function calls itself repeatedly with no termination condition. Consequently, the stack fills up because each call has to push a return address on the stack, but the return addresses are never popped off the stack because the function never returns, it just keeps calling itself.

             

            Show
            fillermark filler mark added a comment - A StackOverflowError is simply signals that there is no more memory available. It is to the stack what an OutOfMemoryError is to the heap: it simply signals that there is no more memory available. JVM has a given memory allocated for each stack of each thread, and if an attempt to call a method happens to fill this memory, JVM throws an error. Just like it would do if you were trying to write at index N of an array of length N. No memory corruption can happen. The stack can not write into the heap. The common cause for a stackoverflow is a bad recursive call. Typically, this is caused when your recursive functions doesn't have the correct termination condition, so it ends up calling itself forever. Or when the termination condition is fine, it can be caused by requiring too many recursive calls before fulfilling it. Here's an example: public class Overflow { public static final void main(String[] args) { main(args); } } That function calls itself repeatedly with no termination condition. Consequently, the stack fills up because each call has to push a return address on the stack, but the return addresses are never popped off the stack because the function never returns, it just keeps calling itself.  

              People

              • Assignee:
                Unassigned
                Reporter:
                davehunt Dave Hunt
              • Votes:
                5 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: