Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13814

java.lang.OutOfMemoryError exception when getting the remote log

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Component/s: cvs-plugin
    • Labels:
    • Environment:
      Jenkins 1.464 on Windows 7 64-bit, 12GB RAM; CVS Plug-in 2.4-SNAPSHOT
    • Similar Issues:

      Description

      We are getting a java.lang.OutOfMemoryError exception when getting the remote log.

      Started by an SCM change
      Building in workspace C:\x\workspace\project_name_here
      cvs checkout -r BRANCH_NAME_HERE -D 16 May 2012 23:27:38 -0700 -d path\projects path/projects
      cvs checkout: Updating path\projects
      cvs checkout: Updating path\projects/dir1
      cvs checkout: Updating path\projects/dir2/dirA
      ... (about 7,224 directories) ...
      cvs checkout: Updating path\projects/dirN
      cvs rlog: Logging path\projects
      ... (about 7,224 directories) ...
      cvs rlog: Logging path\projects/dirN
      FATAL: Java heap space
      java.lang.OutOfMemoryError: Java heap space
      at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
      at java.lang.StringCoding.decode(Unknown Source)
      at java.lang.StringCoding.decode(Unknown Source)
      at java.lang.String.<init>(Unknown Source)
      at java.io.ByteArrayOutputStream.toString(Unknown Source)
      at hudson.scm.CVSSCM.getRemoteLogForModule(CVSSCM.java:540)
      at hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:415)
      at hudson.scm.CVSSCM.checkout(CVSSCM.java:825)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1218)
      at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475)
      at hudson.model.Run.run(Run.java:1434)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:239)

      Some notes:

      • The path/ and path\ comes from the "Remote Name"/"Local Name" settings solution from Issue JENKINS-13264
      • We are using Jenkins 1.464 with the 2.4-SNAPSHOT version of the CVS plugin, but may not have been caused by this specific version
      • The arguments setting in the jenkins.xml file currently has -Xms1024m (anything larger and Jenkins service won't start)
      • The CVS module has many legacy projects spread out, and thus, contains more than 7,000 directories and many more files
      • Things had worked with the 1.6 plug-in & CVSNT client, but it also occasionally timed out getting the changelog, hence our testing of the 2.x plug-ins

        Attachments

          Activity

          Hide
          geneliu Gene Liu added a comment -

          Michael: I have sent you the cvs rlog output via an email.

          Show
          geneliu Gene Liu added a comment - Michael: I have sent you the cvs rlog output via an email.
          Hide
          geneliu Gene Liu added a comment -

          More inputs of cvs rlog stuck (for ever) –

          Executed method:
          java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)

          Executor #1 for sambstage : executing MAIN_DEV #13
          java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
          java.util.regex.Pattern$Curly.match(Pattern.java:3737)
          java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
          java.util.regex.Pattern$Curly.match1(Pattern.java:3797)
          java.util.regex.Pattern$Curly.match(Pattern.java:3746)
          java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
          java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
          java.util.regex.Pattern$Curly.match(Pattern.java:3744)
          java.util.regex.Pattern$Start.match(Pattern.java:3055)
          java.util.regex.Matcher.search(Matcher.java:1105)
          java.util.regex.Matcher.find(Matcher.java:535)
          hudson.scm.CvsChangeLogHelper.mapCvsLog(CvsChangeLogHelper.java:169)
          hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:419)
          hudson.scm.CVSSCM.checkout(CVSSCM.java:831)
          hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
          hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565)
          hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453)
          hudson.model.Run.run(Run.java:1376)
          hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          hudson.model.ResourceController.execute(ResourceController.java:88)
          hudson.model.Executor.run(Executor.java:175)

          Show
          geneliu Gene Liu added a comment - More inputs of cvs rlog stuck (for ever) – Executed method: java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366) Executor #1 for sambstage : executing MAIN_DEV #13 java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366) java.util.regex.Pattern$Curly.match(Pattern.java:3737) java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) java.util.regex.Pattern$Curly.match1(Pattern.java:3797) java.util.regex.Pattern$Curly.match(Pattern.java:3746) java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) java.util.regex.Pattern$Curly.match0(Pattern.java:3782) java.util.regex.Pattern$Curly.match(Pattern.java:3744) java.util.regex.Pattern$Start.match(Pattern.java:3055) java.util.regex.Matcher.search(Matcher.java:1105) java.util.regex.Matcher.find(Matcher.java:535) hudson.scm.CvsChangeLogHelper.mapCvsLog(CvsChangeLogHelper.java:169) hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:419) hudson.scm.CVSSCM.checkout(CVSSCM.java:831) hudson.model.AbstractProject.checkout(AbstractProject.java:1193) hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565) hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453) hudson.model.Run.run(Run.java:1376) hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) hudson.model.ResourceController.execute(ResourceController.java:88) hudson.model.Executor.run(Executor.java:175)
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/scm/CVSChangeLogSet.java
          src/main/java/hudson/scm/CVSSCM.java
          src/main/java/hudson/scm/CvsChangeLogHelper.java
          src/main/java/hudson/scm/CvsLog.java
          src/test/java/hudson/scm/CvsChangeLogHelperTest.java
          http://jenkins-ci.org/commit/cvs-plugin/2565a8f550f61bbca9311aff60293806f4d6f67b
          Log:
          JENKINS-13814 don't buffer the entire rlog output in memory.

          The size of the data "cvs rlog" produces is roughtly O(N*M) where
          N is the # of files in the directory and M is the amount of changes.

          So even when a delta is small, on a large repository this can produce
          significant amount of data. Use of ByteArrayOutputStream causes a large
          spike that can kill a VM, so spill the data over to disk if we are
          getting large output.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/scm/CVSChangeLogSet.java src/main/java/hudson/scm/CVSSCM.java src/main/java/hudson/scm/CvsChangeLogHelper.java src/main/java/hudson/scm/CvsLog.java src/test/java/hudson/scm/CvsChangeLogHelperTest.java http://jenkins-ci.org/commit/cvs-plugin/2565a8f550f61bbca9311aff60293806f4d6f67b Log: JENKINS-13814 don't buffer the entire rlog output in memory. The size of the data "cvs rlog" produces is roughtly O(N*M) where N is the # of files in the directory and M is the amount of changes. So even when a delta is small, on a large repository this can produce significant amount of data. Use of ByteArrayOutputStream causes a large spike that can kill a VM, so spill the data over to disk if we are getting large output.
          Hide
          kohsuke Kohsuke Kawaguchi added a comment -

          I went ahead and made some performance improvements. I hope this didn't step on the toe of Michael.

          My CVS knowledge is rusty, but there's inherent problem in retaining the entire "cvs rlog" output in memory. CVS rlog returns some data for every file in the repository, so on a large repository, it'll produce huge data even if there haven't been many files that have changed. I fixed this.

          I'm also bit disappointed that the optimization we had in the CVS plugin 1.x appears to be gone, where we observed what files have changed during update and use that list to reduce the target of the log command. This was very useful for typical CI situation where your changes between builds are small. I suspect this also has a performance implication.

          Finally, I'd like to advise against relying regular expression matching on entire "cvs log" output block, as it is both error prone and fragile.

          When I look at https://github.com/jenkinsci/cvs-plugin/blob/master/src/main/java/hudson/scm/CvsLog.java#L273 I spot a number of problems right away — for example,

          [\r|\n]+

          is a mistake of

          [\r\n]+

          and there are various

          .+?

          that can incorrectly match multiple lines when the commit message contains stuff that looks suspiciously like cvs log output.

          And in some cases regular expression matching will result in very inefficient backtracking, which is what I suspect to be the cause when people report "cvs rlog stuck".

          I thought we used to bundle package-renamed CVS log parsing code taken from Ant, which I thought did the job better. I'm curious what the motivation is for replacing that with a custom parsing code.

          Show
          kohsuke Kohsuke Kawaguchi added a comment - I went ahead and made some performance improvements. I hope this didn't step on the toe of Michael. My CVS knowledge is rusty, but there's inherent problem in retaining the entire "cvs rlog" output in memory. CVS rlog returns some data for every file in the repository, so on a large repository, it'll produce huge data even if there haven't been many files that have changed. I fixed this. I'm also bit disappointed that the optimization we had in the CVS plugin 1.x appears to be gone, where we observed what files have changed during update and use that list to reduce the target of the log command. This was very useful for typical CI situation where your changes between builds are small. I suspect this also has a performance implication. Finally, I'd like to advise against relying regular expression matching on entire "cvs log" output block, as it is both error prone and fragile. When I look at https://github.com/jenkinsci/cvs-plugin/blob/master/src/main/java/hudson/scm/CvsLog.java#L273 I spot a number of problems right away — for example, [\r|\n]+ is a mistake of [\r\n]+ and there are various .+? that can incorrectly match multiple lines when the commit message contains stuff that looks suspiciously like cvs log output. And in some cases regular expression matching will result in very inefficient backtracking, which is what I suspect to be the cause when people report "cvs rlog stuck". I thought we used to bundle package-renamed CVS log parsing code taken from Ant, which I thought did the job better. I'm curious what the motivation is for replacing that with a custom parsing code.
          Hide
          geneliu Gene Liu added a comment -

          We expect an upgrade of cvs plugin with a shorter interval of changelog. However the result is that the upgraded version (2.x) takes much longer. If we want to use the upgraded one, we have to turn off the changelog feature.

          Show
          geneliu Gene Liu added a comment - We expect an upgrade of cvs plugin with a shorter interval of changelog. However the result is that the upgraded version (2.x) takes much longer. If we want to use the upgraded one, we have to turn off the changelog feature.

            People

            • Assignee:
              mc1arke Michael Clarke
              Reporter:
              jimg888 James Gustafson
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: