Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52374

Issue with unclosed LDAP connections

    Details

    • Similar Issues:

      Description

      On our environment we are observing a high number of threads waiting with the following stack:

      Thread-169"Thread-169" Id=277 Group=main WAITING on java.lang.Object@64e59be7
      	at java.base@10.0.1/java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.Object@64e59be7
      	at java.base@10.0.1/java.lang.Object.wait(Object.java:328)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.pauseReader(Connection.java:771)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.run(Connection.java:911)
      	at java.base@10.0.1/java.lang.Thread.run(Thread.java:844)
      
      Thread-175"Thread-175" Id=283 Group=main WAITING on java.lang.Object@156aef27
      	at java.base@10.0.1/java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.Object@156aef27
      	at java.base@10.0.1/java.lang.Object.wait(Object.java:328)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.pauseReader(Connection.java:771)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.run(Connection.java:911)
      	at java.base@10.0.1/java.lang.Thread.run(Thread.java:844)
      
      Thread-177"Thread-177" Id=285 Group=main WAITING on java.lang.Object@3df5e68a
      	at java.base@10.0.1/java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.Object@3df5e68a
      	at java.base@10.0.1/java.lang.Object.wait(Object.java:328)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.pauseReader(Connection.java:771)
      	at java.naming@10.0.1/com.sun.jndi.ldap.Connection.run(Connection.java:911)
      	at java.base@10.0.1/java.lang.Thread.run(Thread.java:844)
      

      (dump from http://jenkinsurl/threadDump )

      The amount of similar waiting threads increases by 2 on every login and are not being closed leading to a "Too many open files" after a couple days, at that point 378 were waiting with the stack above.

      Please let me know if any further information is required to help solve/reproduce this issue.

       

        Attachments

          Issue Links

            Activity

            Hide
            felipebrnd Felipe Nascimento added a comment -

            Important hint: disabling the START TLS option makes the problem go away

            Show
            felipebrnd Felipe Nascimento added a comment - Important hint: disabling the START TLS option makes the problem go away
            Hide
            jeremycornett Jeremy Cornett added a comment - - edited

            I am able to duplicate this issue running Jenkins via docker 2.176.1-jdk11 with active-directory:2.16. This issue brings my Jenkins instance to a standstill with 8 to 48 hours, causing the UI to become unresponsive, and eventually give an out of memory exception due to the number of threads and open files. I am exploring three options unless the plugin can be fixed...

            1. Switch to LDAPS.
            1. Just live with having insecure LDAP connections for AD authentication on our Jenkins servers (i.e. disable StartTLS permanently).
            1. Downgrade the Jenkins master and build nodes to use Java 8 instead of Java 11.
            Show
            jeremycornett Jeremy Cornett added a comment - - edited I am able to duplicate this issue running Jenkins via docker 2.176.1-jdk11 with active-directory:2.16. This issue brings my Jenkins instance to a standstill with 8 to 48 hours, causing the UI to become unresponsive, and eventually give an out of memory exception due to the number of threads and open files. I am exploring three options unless the plugin can be fixed... Switch to LDAPS. Just live with having insecure LDAP connections for AD authentication on our Jenkins servers (i.e. disable StartTLS permanently). Downgrade the Jenkins master and build nodes to use Java 8 instead of Java 11.
            Hide
            batmat Baptiste Mathus added a comment -

            Jeremy Cornett are you saying you confirmed this issue does not happen on Java 8?
            Could you please provide the memory settings you are using (or the image in use, if the defaults). Thanks!

            Show
            batmat Baptiste Mathus added a comment - Jeremy Cornett are you saying you confirmed this issue does not happen on Java 8? Could you please provide the memory settings you are using (or the image in use, if the defaults). Thanks!
            Hide
            jeremycornett Jeremy Cornett added a comment -

            Yes, I can confirm this was working on Java 8. Specifically, on 6/4/2019, I upgrade our Jenkins instance from docker jenkins/jenkins:2.164.1 to jenkins/jenkins:2.164.1-jdk11 with active-directory:2.8 to active-directory:2.13. Immediately thereafter, our Jenkins instance started crashing. It took a number of weeks for me diagnose this problem properly, and I subsequently tried upgrading to newer versions of Jenkins and the active-directory plugin. We are now using docker jenkins/jenkins:2.176.1-jdk11 and active-directory:2.16. I finally resolved the issue in our instance by abandoning StartTLS and using LDAPS, as outlined in the plugin documentation.

            Memory settings, the VM I was using had 2 cores and 12 GB of RAM initially. I thought the problem was a memory issue, so on 6/11/2019, I changed the VM to 4 cores and 16 GB of RAM, but that didn't make a discernible difference. I found that when I monitored top on the VM (CentOS 7.6), memory usage would never go above 4 GB of RAM, but the virtual memory would grow and grow over time. The highest I saw VIRT was about 28 GB.

            Eventually, I installed the plugin monitoring:1.77.0. This allowed me to see the number of open files and threads, which also allowed me to see threads were Waiting and had similar information as what is on this ticket. When I then disabled StartTLS, the symptom went away completely.

            Show
            jeremycornett Jeremy Cornett added a comment - Yes, I can confirm this was working on Java 8. Specifically, on 6/4/2019, I upgrade our Jenkins instance from docker jenkins/jenkins:2.164.1 to jenkins/jenkins:2.164.1-jdk11 with active-directory:2.8 to active-directory:2.13. Immediately thereafter, our Jenkins instance started crashing. It took a number of weeks for me diagnose this problem properly, and I subsequently tried upgrading to newer versions of Jenkins and the active-directory plugin. We are now using docker jenkins/jenkins:2.176.1-jdk11 and active-directory:2.16. I finally resolved the issue in our instance by abandoning StartTLS and using LDAPS, as outlined in the plugin documentation. Memory settings, the VM I was using had 2 cores and 12 GB of RAM initially. I thought the problem was a memory issue, so on 6/11/2019, I changed the VM to 4 cores and 16 GB of RAM, but that didn't make a discernible difference. I found that when I monitored  top on the VM (CentOS 7.6), memory usage would never go above 4 GB of RAM, but the virtual memory would grow and grow over time. The highest I saw VIRT was about 28 GB. Eventually, I installed the plugin monitoring:1.77.0. This allowed me to see the number of open files and threads, which also allowed me to see threads were Waiting and had similar information as what is on this ticket. When I then disabled StartTLS, the symptom went away completely.
            Hide
            nridgway Nick Ridgway added a comment -

            I am also seeing this issue.  Switching STARTTLS off isn't an option for me, is there any information I can provide to help diagnose/fix the issue? 

            Show
            nridgway Nick Ridgway added a comment - I am also seeing this issue.  Switching STARTTLS off isn't an option for me, is there any information I can provide to help diagnose/fix the issue? 
            Hide
            gradol Oliver Grad added a comment -

            We discovered the same issue. On 3/10/2020 we upgraded our productive Jenkins (2.204.5) from Java 8 to Java 11 and the missbehaviour started immediatly.

            We use AD-Plugin version 2.16.
            As soon as we disabled StartTLS the problem disappeared (on the right side of the graph).

            Show
            gradol Oliver Grad added a comment - We discovered the same issue. On 3/10/2020 we upgraded our productive Jenkins (2.204.5) from Java 8 to Java 11 and the missbehaviour started immediatly. We use AD-Plugin version 2.16. As soon as we disabled StartTLS the problem disappeared (on the right side of the graph).
            Hide
            mramonleon Ramon Leon added a comment - - edited

            The issue is related to LdapContext#reconnect always opens a new connection https://bugs.openjdk.java.net/browse/JDK-8217606 We do a reconnect at https://github.com/jenkinsci/active-directory-plugin/blob/d3a94592176108701e72ad8726f462c6f0e8b606/src/main/java/hudson/plugins/active_directory/ActiveDirectorySecurityRealm.java#L721

            It's worth to check whether it's fixed with Java 11.0.8 (2020-07-14). Seems to be fixed there.

            Versions where the fix was backported:

            Issue Fix Version
            JDK-8245802 13.0.4 
            JDK-8237876 11.0.8-oracle 
            JDK-8240434 11.0.8 
            JDK-8249807 openjdk8u272 
            JDK-8248118 8u271 
            JDK-8248718 8u261 
            JDK-8251719 emb-8u271 
            Show
            mramonleon Ramon Leon added a comment - - edited The issue is related to LdapContext#reconnect always opens a new connection https://bugs.openjdk.java.net/browse/JDK-8217606 We do a reconnect at https://github.com/jenkinsci/active-directory-plugin/blob/d3a94592176108701e72ad8726f462c6f0e8b606/src/main/java/hudson/plugins/active_directory/ActiveDirectorySecurityRealm.java#L721 It's worth to check whether it's fixed with Java 11.0.8 (2020-07-14). Seems to be fixed there. Versions where the fix was backported: Issue Fix Version JDK-8245802 13.0.4  JDK-8237876 11.0.8-oracle  JDK-8240434 11.0.8  JDK-8249807 openjdk8u272  JDK-8248118 8u271  JDK-8248718 8u261  JDK-8251719 emb-8u271 
            Hide
            mramonleon Ramon Leon added a comment -

            I put it on review to gather feedback from affected people to see whether the Java update works

            Show
            mramonleon Ramon Leon added a comment - I put it on review to gather feedback from affected people to see whether the Java update works

              People

              • Assignee:
                mramonleon Ramon Leon
                Reporter:
                felipebrnd Felipe Nascimento
              • Votes:
                3 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: