Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20051

Winstone goes down on 1.509.4 (due to LDAP issues?)

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We have experienced a strange issue with Jenkins 1.509.4. Web interface has gone offline after several days.

      • "RMI TCP Connection" thread is not present in the thread dump
      • "RMI Scheduler" and "RMI accept" exist in the thread
      • There is no info in jenkins.log
      • There's no BLOCKED threads
      • Jenkins internal jobs work well, only the web interface is not available

      After analysis of logs, seems that issue has been caused by "Too many open files" error:

      Oct 14, 2013 5:27:48 PM winstone.Logger logInternal
      SEVERE: Error during HTTP listener init or shutdown
      java.net.SocketException: Too many open files
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
      at java.net.ServerSocket.implAccept(ServerSocket.java:530)
      at java.net.ServerSocket.accept(ServerSocket.java:498)
      at winstone.HttpListener.run(HttpListener.java:139)
      at java.lang.Thread.run(Thread.java:724)

      According to logs, there is a huge number of LDAP errors

      WARNING: Failed to search LDAP for username=USER_NAME
      org.acegisecurity.ldap.LdapDataAccessException: Failed to obtain InitialDirContext due to unexpected exception; nested exception is javax.naming.ServiceUnavailableException: ru20waddc.internal.synopsys.com:389; socket closed
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.connect(DefaultInitialDirContextFactory.java:193)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.newInitialDirContext(DefaultInitialDirContextFactory.java:261)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.newInitialDirContext(DefaultInitialDirContextFactory.java:241)
      at org.acegisecurity.ldap.LdapTemplate.execute(LdapTemplate.java:123)
      at org.acegisecurity.ldap.LdapTemplate.searchForSingleEntry(LdapTemplate.java:246)
      at org.acegisecurity.ldap.search.FilterBasedLdapUserSearch.searchForUser(FilterBasedLdapUserSearch.java:119)
      at hudson.security.LDAPSecurityRealm$LDAPUserDetailsService.loadUserByUsername(LDAPSecurityRealm.java:591)
      at hudson.security.LDAPSecurityRealm$LDAPUserDetailsService.loadUserByUsername(LDAPSecurityRealm.java:553)
      at org.acegisecurity.ui.rememberme.TokenBasedRememberMeServices.loadUserDetails(TokenBasedRememberMeServices.java:308)
      at org.acegisecurity.ui.rememberme.TokenBasedRememberMeServices.autoLogin(TokenBasedRememberMeServices.java:218)
      at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:104)
      ...
      at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:46)
      at winstone.FilterConfiguration.execute(FilterConfiguration.java:194)
      at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:366)
      at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:81)
      at winstone.FilterConfiguration.execute(FilterConfiguration.java:194)
      at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:366)
      at winstone.RequestDispatcher.forward(RequestDispatcher.java:331)
      at winstone.RequestHandlerThread.processRequest(RequestHandlerThread.java:227)
      at winstone.RequestHandlerThread.run(RequestHandlerThread.java:150)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:724)
      Caused by: javax.naming.ServiceUnavailableException: LDAP_ADDRESS; socket closed
      at com.sun.jndi.ldap.Connection.readReply(Connection.java:454)
      at com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:364)
      at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:213)
      at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2740)
      at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:316)
      at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:193)
      at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:211)
      at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154)
      at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84)
      at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
      at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:307)
      at javax.naming.InitialContext.init(InitialContext.java:242)
      at javax.naming.InitialContext.<init>(InitialContext.java:216)
      at javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:101)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.connect(DefaultInitialDirContextFactory.java:180)
      ... 39 more

        Attachments

        1. JENKINS-20051_HTTP_Sessions.png
          82 kB
          Oleg Nenashev
        2. JENKINS-20051_OpennedFiles.png
          101 kB
          Oleg Nenashev

          Issue Links

            Activity

            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Fixed logs

            Show
            oleg_nenashev Oleg Nenashev added a comment - Fixed logs
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Added files from monitoring

            Show
            oleg_nenashev Oleg Nenashev added a comment - Added files from monitoring
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Unfortunately, issue has been reproduced on our server.
            As I see, the issue's origin was in LDAP server, which somehow caused hanging of LDAP requests till the timeout ()

            • Seems that each request spawns an additional unnamed thread, which opens a new connection to LDAP reads info from it
            • Due to LDAP service, this IO hangs
            • In case of huge number of requests to LDAP, number of opened files and threads goes beyond limits
            • No surprise that some threads in Jenkins collapse in such situations

            Partial workaround:

            • always use caching in LDAP

            BTW, it would be great to limit a number of parallel requests to LDAP

            Spawned LDAP authentication thread:

            "Thread-30370" daemon prio=10 tid=0x00002aaae0001000 nid=0x2849 runnable [0x000000005fb1c000]
            java.lang.Thread.State: RUNNABLE
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:152)
            at java.net.SocketInputStream.read(SocketInputStream.java:122)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:334)

            • locked <0x000000072b1de300> (a java.io.BufferedInputStream)
              at com.sun.jndi.ldap.Connection.run(Connection.java:853)
              at java.lang.Thread.run(Thread.java:724)
            Show
            oleg_nenashev Oleg Nenashev added a comment - Unfortunately, issue has been reproduced on our server. As I see, the issue's origin was in LDAP server, which somehow caused hanging of LDAP requests till the timeout () Seems that each request spawns an additional unnamed thread, which opens a new connection to LDAP reads info from it Due to LDAP service, this IO hangs In case of huge number of requests to LDAP, number of opened files and threads goes beyond limits No surprise that some threads in Jenkins collapse in such situations Partial workaround: always use caching in LDAP BTW, it would be great to limit a number of parallel requests to LDAP Spawned LDAP authentication thread: "Thread-30370" daemon prio=10 tid=0x00002aaae0001000 nid=0x2849 runnable [0x000000005fb1c000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) locked <0x000000072b1de300> (a java.io.BufferedInputStream) at com.sun.jndi.ldap.Connection.run(Connection.java:853) at java.lang.Thread.run(Thread.java:724)
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Reassigned issue to Stephen (issue has been caused by LDAP)

            Show
            oleg_nenashev Oleg Nenashev added a comment - Reassigned issue to Stephen (issue has been caused by LDAP)
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Workaround exists, so issue is not a blocker

            Show
            oleg_nenashev Oleg Nenashev added a comment - Workaround exists, so issue is not a blocker
            Hide
            danielbeck Daniel Beck added a comment -

            Seems to be LDAP alone; assuming core only due to use of former 'www' component.

            Show
            danielbeck Daniel Beck added a comment - Seems to be LDAP alone; assuming core only due to use of former 'www' component.
            Hide
            stephenconnolly Stephen Connolly added a comment -

            I have not heard of this issue for some time... anyway no point in me being the assignee

            Show
            stephenconnolly Stephen Connolly added a comment - I have not heard of this issue for some time... anyway no point in me being the assignee

              People

              • Assignee:
                Unassigned
                Reporter:
                oleg_nenashev Oleg Nenashev
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: