Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-16429

ComThread is leaking even when its peer is dead

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Environment:
    • Similar Issues:

      Description

      I get the following stack trace once a day:

      Exception:
      FATAL: unable to create new native thread
      java.lang.OutOfMemoryError: unable to create new native thread
          at java.lang.Thread.start0(Native Method)
          at java.lang.Thread.start(Thread.java:640)
          at hudson.Proc$LocalProc.<init>(Proc.java:264)
          at hudson.Proc$LocalProc.<init>(Proc.java:216)
          at hudson.Launcher$LocalLauncher.launch(Launcher.java:763)
          at hudson.Launcher$ProcStarter.start(Launcher.java:353)
          at hudson.plugins.perforce.HudsonP4DefaultExecutor.exec(HudsonP4DefaultExecutor.java:79)
          at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:321)
          at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:292)
          at com.tek42.perforce.parse.Workspaces.getWorkspace(Workspaces.java:61)
          at hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:1467)
          at hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:1428)
          at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:792)
          at hudson.model.AbstractProject.checkout(AbstractProject.java:1325)
          at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676)
          at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581)
          at hudson.model.Run.execute(Run.java:1516)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:236)
      

      The URL http:/jenkinsserver/threadDump shows that there are hundreds of ComThreads:

      ComThread for RequestHandlerThread[#301]
      "ComThread for RequestHandlerThread[#301]" Id=50942 Group=main RUNNABLE (in native)
          at com4j.Win32Lock.suspend0(Native Method)
          at com4j.Win32Lock.suspend(Win32Lock.java:37)
          at com4j.ComThread.run0(ComThread.java:146)
          at com4j.ComThread.run(ComThread.java:134)
      

      The Monitoring plugin (https://wiki.jenkins-ci.org/display/JENKINS/Monitoring) shows that about 40 threads are added every hour. I tried to kill the ComThreads with the plugin but this did not work.

      As far as I understand the threads belong to the Active Directory plugin but I do not understand why they are created / never finish.

      I have activated logging for the Active Directory plugin by enabling the following loggers (all with log level all):

      • hudson.plugins.active_directory
      • org.acegisecurity
      • com4j

      The log contains a lot of entries like this:

      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter doFilter
      FINE: SecurityContextHolder now cleared, as request processing completed
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.providers.anonymous.AnonymousProcessingFilter doFilter
      FINE: Populated SecurityContextHolder with anonymous token: 'org.acegisecurity.providers.anonymous.AnonymousAuthenticationToken@7ceb964f: Username: anonymous; Password: [PROTECTED]; Authenticated: true; Details: org.acegisecurity.ui.WebAuthenticationDetails@1de6: RemoteIpAddress: ##.##.##.##; SessionId: null; Granted Authorities: '
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter storeSecurityContextInSession
      FINE: The HttpSession is currently null, and the HttpSessionContextIntegrationFilter is prohibited from creating an HttpSession (because the allowSessionCreation property is false) - SecurityContext thus not stored for next request
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.ui.basicauth.BasicProcessingFilter doFilter
      FINE: Authorization header: null
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.ui.ExceptionTranslationFilter doFilter
      FINE: Chain processed normally
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter doFilter
      FINE: New SecurityContext instance will be associated with SecurityContextHolder
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter readSecurityContextFromSession
      FINE: No HttpSession currently exists
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter doFilter
      FINE: SecurityContextHolder now cleared, as request processing completed
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.context.HttpSessionContextIntegrationFilter storeSecurityContextInSession
      FINE: The HttpSession is currently null, and the HttpSessionContextIntegrationFilter is prohibited from creating an HttpSession (because the allowSessionCreation property is false) - SecurityContext thus not stored for next request
      
      Jan 16, 2013 12:20:27 PM org.acegisecurity.ui.ExceptionTranslationFilter doFilter
      FINE: Chain processed normally
      

      A lot of traffic (about four requests per second) is caused by the desktop application Hudson Tray Tracker (https://code.google.com/p/hudson-tray-tracker/). This application does not log in (it uses anonymous access).

      I have also

      • enabled access logging, but found no unusual URL requests
      • connected with VisualVM (http://visualvm.java.net/) to the Jenkins JVM, but got no more informations
      • created a PowerShell script that logs the number of threads, handles, working set and the virtual size of the Java process, but it only shows that all numbers are constantly increasing:
      Date Time Pid Thread Handle
      Count
      WorkingSet VirtualSize
      2013-01-20 22:01:31 740 86 1380 474.285 MB 1,291.566 MB
      2013-01-20 23:01:31 740 111 1719 520.957 MB 1,309.094 MB
      2013-01-21 00:01:31 740 138 2094 576.602 MB 1,322.531 MB
      2013-01-21 01:01:30 740 161 2381 630.742 MB 1,334.031 MB
      2013-01-21 02:01:30 740 177 2563 635.363 MB 1,342.094 MB
      2013-01-21 03:01:30 740 203 2884 638.371 MB 1,355.031 MB
      2013-01-21 04:01:31 740 222 3192 660.984 MB 1,396.156 MB
      2013-01-21 05:01:30 740 243 3443 641.645 MB 1,406.609 MB
      2013-01-21 06:01:30 740 264 3719 648.918 MB 1,417.219 MB
      2013-01-21 07:01:31 740 294 4114 661.613 MB 1,432.156 MB
      2013-01-21 08:01:31 740 313 4357 647.117 MB 1,441.719 MB
      2013-01-21 09:01:30 740 344 4758 667.797 MB 1,457.219 MB
      2013-01-21 10:01:31 740 414 5701 742.559 MB 1,575.898 MB
      2013-01-21 11:01:31 740 502 6777 729.488 MB 1,620.813 MB
      2013-01-21 12:01:52 740 562 7533 740.715 MB 1,650.813 MB
      2013-01-21 13:01:31 740 616 8313 763.828 MB 1,677.922 MB

      The problem is related to system load: at weekend the numbers are increasing at a lower rate than at work days. Between Christmas and New Year it didn't happened at all as most developers have been on vacation.

      • created a Groovey script:
      def threads=com4j.ComThread.threads.toArray(new com4j.ComThread[com4j.ComThread.threads.size()]);
      def numberOfTerminatedPeerThreads=0
      
      println("The follwing ComThreads have a peer thread which has been terminated")
      for (t in threads) {
        if (t.getName().startsWith("ComThread for RequestHandlerThread[#") &&
            t.peer.getState().equals(Thread.State.TERMINATED) ) {
          println("ComThread: " + t.getName() + "  Peer: " + t.peer )
          numberOfTerminatedPeerThreads++
          //t.kill()
        }
      }
      
      println("\nnumber of terminated peer threads: " + numberOfTerminatedPeerThreads)
      

      result:

      The follwing ComThreads have a peer thread which has been terminated
      ComThread: ComThread for RequestHandlerThread[#357]  Peer: Thread[RequestHandlerThread[#357],5,]
      ComThread: ComThread for RequestHandlerThread[#258]  Peer: Thread[RequestHandlerThread[#258],5,]
      ...
      
      number of terminated peer threads: 506
      

        Attachments

          Issue Links

            Activity

            Hide
            bernhardb Bernhard Berbuir added a comment -

            I have found a workaround: I have increased the maximum number of idle handler threads. I added the following parameters at the <arguments>-tag in the jenkins.xml file:
            --handlerCountStartup=50 --handlerCountMax=100 --handlerCountMaxIdle=50

            Now I do not have a single ComThread with a terminated peer thread and the JVM is running smoothly.

            I suspect that the continuous creating and destroying of threads caused the thread leak. The winstone servlet engine or the ComThread library seems not be able to handle such a scenario successfully.

            Show
            bernhardb Bernhard Berbuir added a comment - I have found a workaround: I have increased the maximum number of idle handler threads. I added the following parameters at the <arguments> -tag in the jenkins.xml file: --handlerCountStartup=50 --handlerCountMax=100 --handlerCountMaxIdle=50 Now I do not have a single ComThread with a terminated peer thread and the JVM is running smoothly. I suspect that the continuous creating and destroying of threads caused the thread leak. The winstone servlet engine or the ComThread library seems not be able to handle such a scenario successfully.
            Hide
            porterhouse91 porterhouse91 added a comment -

            Berhard, how many Jenkins jobs do you have running? We started experiencing these issues due to an increased number of jobs (we think). Right now we are at ~200 jobs polling every minute.

            Show
            porterhouse91 porterhouse91 added a comment - Berhard, how many Jenkins jobs do you have running? We started experiencing these issues due to an increased number of jobs (we think). Right now we are at ~200 jobs polling every minute.
            Hide
            bernhardb Bernhard Berbuir added a comment -

            I have only a small number of jobs (18-24 jobs). I am using one master and do not have a slave.
            ==> all polling is done at the master

            I have set the option "Max # of concurrent polling" to 1 (Jenkins | Manage Jenkins | Configure System | SCM Polling)
            ==> Jenkins runs only one polling at a time (I have several jobs which are polling every minute).

            I assume that the Hudson Tray Tracker (HTT) is the problem as all ComThreads have a terminated RequestHandlerThread. HTT polls every 15 seconds for every job (its like a DDoS Tool )

            Show
            bernhardb Bernhard Berbuir added a comment - I have only a small number of jobs (18-24 jobs). I am using one master and do not have a slave. ==> all polling is done at the master I have set the option "Max # of concurrent polling" to 1 (Jenkins | Manage Jenkins | Configure System | SCM Polling) ==> Jenkins runs only one polling at a time (I have several jobs which are polling every minute). I assume that the Hudson Tray Tracker (HTT) is the problem as all ComThreads have a terminated RequestHandlerThread. HTT polls every 15 seconds for every job (its like a DDoS Tool )
            Hide
            porterhouse91 porterhouse91 added a comment -

            I am fairly certain that our issue is related to the Perforce plugin. I filed a ticket with the plugin providers: https://issues.jenkins-ci.org/browse/JENKINS-17630

            Do you happen to see a large number of active 'p4 client -i' processes just sitting around? This seems to be the core of our problem.

            Show
            porterhouse91 porterhouse91 added a comment - I am fairly certain that our issue is related to the Perforce plugin. I filed a ticket with the plugin providers: https://issues.jenkins-ci.org/browse/JENKINS-17630 Do you happen to see a large number of active 'p4 client -i' processes just sitting around? This seems to be the core of our problem.
            Hide
            bernhardb Bernhard Berbuir added a comment -

            You seem to have a different problem, because I never had problems with Perforce threads.

            Show
            bernhardb Bernhard Berbuir added a comment - You seem to have a different problem, because I never had problems with Perforce threads.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Reassining to the perforce plugin

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Reassining to the perforce plugin
            Hide
            rpetti Rob Petti added a comment -

            What is a ComThread? What's creating those? The perforce plugin doesn't use any such threads, so I don't believe this is an issue with the perforce plugin. The only reason the perforce plugin is throwing an exception is because the jenkins instance has run out of memory.

            Show
            rpetti Rob Petti added a comment - What is a ComThread? What's creating those? The perforce plugin doesn't use any such threads, so I don't believe this is an issue with the perforce plugin. The only reason the perforce plugin is throwing an exception is because the jenkins instance has run out of memory.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            After talking to Rob Petti and looking at this more carefully, the issue is that ComThread isn't terminating (even though presumably their peer thread is long dead.)

            Bringing this back to the active directory since that uses com4j, not perforce.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - After talking to Rob Petti and looking at this more carefully, the issue is that ComThread isn't terminating (even though presumably their peer thread is long dead.) Bringing this back to the active directory since that uses com4j, not perforce.
            Hide
            walterk82 Walter Kacynski added a comment -

            I switched from LDAP to AD plugin and I am experiencing this problem. The number of ComThread entries are very large!

            Show
            walterk82 Walter Kacynski added a comment - I switched from LDAP to AD plugin and I am experiencing this problem. The number of ComThread entries are very large!
            Hide
            walterk82 Walter Kacynski added a comment -

            Attached is a screen print of the grown pool of ComThreads

            Show
            walterk82 Walter Kacynski added a comment - Attached is a screen print of the grown pool of ComThreads
            Hide
            walterk82 Walter Kacynski added a comment -

            Kohsuke, please take a look at this when you get a chance. In our current environment, the active-directory plugin is not usable because it forces us to restart the master on a scheduled basis.

            Show
            walterk82 Walter Kacynski added a comment - Kohsuke, please take a look at this when you get a chance. In our current environment, the active-directory plugin is not usable because it forces us to restart the master on a scheduled basis.
            Hide
            dennys Dennys Hsieh added a comment -

            We have this issue and need to restart Jenkins 1~2 times every day, is there any work around for it?

            Show
            dennys Dennys Hsieh added a comment - We have this issue and need to restart Jenkins 1~2 times every day, is there any work around for it?
            Hide
            dennys Dennys Hsieh added a comment -

            I try to run the Groovy script but it seems not work and when I run it again, I got the message

            Caused by: java.lang.UnsatisfiedLinkError: Native Library C:\Windows\System32\com4j-x86.dll already loaded in another classloader

            I put com4j-x86.dll in C:\Windows\System32, is it correct? And I didn't see the message output by Groovy's println, should I see it in Jenkins' console?

            Show
            dennys Dennys Hsieh added a comment - I try to run the Groovy script but it seems not work and when I run it again, I got the message Caused by: java.lang.UnsatisfiedLinkError: Native Library C:\Windows\System32\com4j-x86.dll already loaded in another classloader I put com4j-x86.dll in C:\Windows\System32, is it correct? And I didn't see the message output by Groovy's println, should I see it in Jenkins' console?
            Hide
            bernhardb Bernhard Berbuir added a comment -

            @Dennys: the com4j DLL is part of the active-directory plugin. It is included in its JPI file and automatically copied to %JENKINS_HOME%\plugins\active-directory\WEB-INF\lib. Check if the file exists there and try to remove any other occurrence of the DLL.

            I assume that due to the UnsatisfiedLinkError the script can not be executed and therefore no message is printed. You can add the following line at the begining of the script:

             println("### INFO - searching for ComThreads which have a terminated peer thread")
            Show
            bernhardb Bernhard Berbuir added a comment - @Dennys: the com4j DLL is part of the active-directory plugin. It is included in its JPI file and automatically copied to %JENKINS_HOME%\plugins\active-directory\WEB-INF\lib . Check if the file exists there and try to remove any other occurrence of the DLL. I assume that due to the UnsatisfiedLinkError the script can not be executed and therefore no message is printed. You can add the following line at the begining of the script: println( "### INFO - searching for ComThreads which have a terminated peer thread" )
            Hide
            dennys Dennys Hsieh added a comment -

            I found com4j-x86.dll is in %JENKINS_HOME%\plugins\active-directory\WEB-INF\lib, but it still doesn't work. And I try to put com4j-20120426-2.jar (extract from Activedirectory plugin) into jenkins.war (in WEB-INF\lib)

            After restart Jenkins, it works... is this a good solution ?

            Show
            dennys Dennys Hsieh added a comment - I found com4j-x86.dll is in %JENKINS_HOME%\plugins\active-directory\WEB-INF\lib, but it still doesn't work. And I try to put com4j-20120426-2.jar (extract from Activedirectory plugin) into jenkins.war (in WEB-INF\lib) After restart Jenkins, it works... is this a good solution ?
            Hide
            vanharen Jeremy Van Haren added a comment -

            Is @Dennys Hsieh's workaround on 19/Nov/13 9:24AM a valid solution? Not sure I understand it. We like(d) the AD plugin so much better than the LDAP plugin but this issue continues to be a reason why we can't use it any more.

            Show
            vanharen Jeremy Van Haren added a comment - Is @Dennys Hsieh's workaround on 19/Nov/13 9:24AM a valid solution? Not sure I understand it. We like(d) the AD plugin so much better than the LDAP plugin but this issue continues to be a reason why we can't use it any more.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            pom.xml
            http://jenkins-ci.org/commit/active-directory-plugin/5fbb298d9e66f256a8549fa3e0ed81e8c1cb100b
            Log:
            [FIXED JENKINS-16429]

            Integrated a newer version of com4j that fixes the leak

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: pom.xml http://jenkins-ci.org/commit/active-directory-plugin/5fbb298d9e66f256a8549fa3e0ed81e8c1cb100b Log: [FIXED JENKINS-16429] Integrated a newer version of com4j that fixes the leak
            Hide
            dennys Dennys Hsieh added a comment -

            I just upgrade to active directory plugin 1.36 with com4j 2.1
            It seems the leak issue is not fixed yet.

            But my groovy script runs every 60mins, I'm not sure will the new com4j clear the leak thread in batch or any dependency?

            Show
            dennys Dennys Hsieh added a comment - I just upgrade to active directory plugin 1.36 with com4j 2.1 It seems the leak issue is not fixed yet. But my groovy script runs every 60mins, I'm not sure will the new com4j clear the leak thread in batch or any dependency?
            Hide
            franks Frank Seifart added a comment - - edited

            It seems that the issue is not fixed completely.
            Now the number of threads are ok, but java memory usage is excessively high and rises constantly as before.
            I think there is a remaining memory leak.

            Show
            franks Frank Seifart added a comment - - edited It seems that the issue is not fixed completely. Now the number of threads are ok, but java memory usage is excessively high and rises constantly as before. I think there is a remaining memory leak.
            Hide
            dennys Dennys Hsieh added a comment -

            Sorry, I think the thread issue is fixed. Because I forget to remove the old com4j in WEB-INF\lib of jenkins.war (in order to run the Groovy script). After remove it, the thread is normal now.

            Show
            dennys Dennys Hsieh added a comment - Sorry, I think the thread issue is fixed. Because I forget to remove the old com4j in WEB-INF\lib of jenkins.war (in order to run the Groovy script). After remove it, the thread is normal now.
            Hide
            jglick Jesse Glick added a comment -

            @franks then you may be seeing an unrelated issue; file it separately (and use for example jmap -histo:live $pid to diagnose).

            Show
            jglick Jesse Glick added a comment - @franks then you may be seeing an unrelated issue; file it separately (and use for example jmap -histo:live $pid to diagnose).
            Hide
            walterk82 Walter Kacynski added a comment -

            I was looking at how this plugin is packaged and there is a copy of com4j-amd64.dll in Jenkins\plugins\active-directory\WEB-INF\lib and there is also a copy inside com4j-2.1.jar\com4j\

            Is this expected?

            Show
            walterk82 Walter Kacynski added a comment - I was looking at how this plugin is packaged and there is a copy of com4j-amd64.dll in Jenkins\plugins\active-directory\WEB-INF\lib and there is also a copy inside com4j-2.1.jar\com4j\ Is this expected?
            Hide
            walterk82 Walter Kacynski added a comment -

            The files are of identical size.

            Show
            walterk82 Walter Kacynski added a comment - The files are of identical size.
            Hide
            jglick Jesse Glick added a comment -

            @walterk82 probably a mistake but better to file it separately rather than confusing the discussion in this issue.

            Show
            jglick Jesse Glick added a comment - @walterk82 probably a mistake but better to file it separately rather than confusing the discussion in this issue.

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                bernhardb Bernhard Berbuir
              • Votes:
                9 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: