Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9882

Jenkins runs out of file descriptors (winstone problem)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None
    • Environment:
      Environment: Debian 5, sun-java-jdk (1.6.0_22)
      Jenkins version: 1.414-SNAPSHOT
    • Similar Issues:

      Description

      Running Jenkins with the embedded Winstone server for a long time
      under constant load conditions causes file descriptor and thread
      leakage.

      Environment: Debian 5, sun-java-jdk (1.6.0_22)
      Jenkins version: 1.414-SNAPSHOT

      What happens:

      After running for about 1 day the following appears on jenkins log
      file:

      [Winstone 2011/05/27 07:35:03] - WARNING: Request handler pool limit exceeded - waiting for retry

      and a bit later (this starts repeating):

      [Winstone 2011/05/27 07:43:25] - WARNING: Request handler pool limit exceeded - waiting for retry
      [Winstone 2011/05/27 07:43:26] - ERROR: Request ignored because there were no more request handlers available in the pool
      [Winstone 2011/05/27 07:43:36] - WARNING: Request handler pool limit exceeded - waiting for retry
      [Winstone 2011/05/27 07:43:37] - ERROR: Request ignored because there were no more request handlers available in the pool

      Jenkins then stops handling requests successfully - at the beginning
      intermittently, but finally basically failing almost all of the
      requests.

      Using VisualVM I can see that there is a thousand RequestHandlerThread
      threads in wait state, and that over 1200 file descriptors are
      currently in use.

      I think the requests start failing because winstone has a this limit:

      private int MAX_REQUEST_HANDLERS_IN_POOL = 1000;

      as it doesn't seem to be running out of available fds (apparently 8192
      is the maximum in this setup).

      When I restart jenkins I can verify a slow buildup of threads and used
      file descriptors:

      • 10 minutes after restart: 136 live threads, 256 fds used
      • 20 minutes: 150 threads, 271 fds
      • 30 minutes: 161 threads, 280 fds
      • 110 minutes: 255 threads, 376 fds

      I've looked at the repository version of winstone, and looking at the
      code there seems to be a race condition in handling of the request
      handler pool.

      When a request is received by ObjectPool.handleRequest, it looks for
      an available request handler from unusedRequestHandlerThreads and
      calls commenceRequestHandling on the available thread.

      commenceRequestHandling in turn does this.notifyAll() to wake up the
      thread. So far so good. However when the thread has finished
      processing the request, it calls
      this.objectPool.releaseRequestHandler(this) and then waits. I think
      here's a race condition, since what can happen is that object pool
      called (CALL) and request handler thread (RH) can interleave like
      this:

      1. RH (in RequestHandler.run): this.objectPool.releaseRequestHandler(this)
      2. RH (in ObjectPool.releaseRequestHandler): this.unusedRequestHandlerThreads.add(rh)
      3. CALL (in ObjectPool.handleRequest): take RH from unusedRequestHandlerThreads
      4. CALL (in ObjectPool.handleRequest): rh.commenceRequestHandling(socket, listener);
      5. CALL (in RequestHandler.commenceRequestHandling): this.notifyAll()
      6. RH (in ObjectPool.run): this.wait()

      Since notify is lost (no waiters), this.wait() in the last step will
      hang forever. This will leak a file descriptor since the socket given
      to be processed is never reclaimed, and threads are effectively lost
      as Winstone will then create more RequestHandlers.

      Now, this is of course a winstone problem, but its development seems
      to be d-e-a-d at least looking at its bug tracker. As long as this
      problem affect Jenkins, I'd still classify it as a Jenkins problem too.

      I've put this into the winstone tracker too: https://sourceforge.net/tracker/?func=detail&aid=3308285&group_id=98922&atid=622497

      Workaround: Use Tomcat, not embedded winstone (that's what I'm doing now).

        Attachments

          Activity

          Hide
          ustuehler Uwe Stuehler added a comment - - edited

          It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing.

          After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()...

          And then nothing happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE.

          Is 200 a fixed limit on the number of request handler threads?

          Edit: added stack trace

          All RequestHandlerThread threads have this exact same backtrace:

          "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000]
             java.lang.Thread.State: RUNNABLE
                  at java.net.SocketInputStream.socketRead0(Native Method)
                  at java.net.SocketInputStream.read(SocketInputStream.java:129)
                  at java.io.DataInputStream.readFully(DataInputStream.java:178)
                  at java.io.DataInputStream.readFully(DataInputStream.java:152)
                  at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60)
                  at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170)
                  at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          
          Show
          ustuehler Uwe Stuehler added a comment - - edited It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing. After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()... And then nothing happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE. Is 200 a fixed limit on the number of request handler threads? Edit : added stack trace All RequestHandlerThread threads have this exact same backtrace: "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60) at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170) at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
          Hide
          ustuehler Uwe Stuehler added a comment -

          FWIW: The number 200 is indeed the thread pool size limit.

          Show
          ustuehler Uwe Stuehler added a comment - FWIW: The number 200 is indeed the thread pool size limit .
          Hide
          marc_guenther Marc Günther added a comment -

          Uwe Stuehler wrote:
          > Is 200 a fixed limit on the number of request handler threads?

          It is. From winstone.ObjectsPool.java:35

              private int MAX_REQUEST_HANDLERS_IN_POOL = 200;
          Show
          marc_guenther Marc Günther added a comment - Uwe Stuehler wrote: > Is 200 a fixed limit on the number of request handler threads? It is. From winstone.ObjectsPool.java:35 private int MAX_REQUEST_HANDLERS_IN_POOL = 200;
          Hide
          ustuehler Uwe Stuehler added a comment -

          New issue opened: JENKINS-13275

          Show
          ustuehler Uwe Stuehler added a comment - New issue opened: JENKINS-13275
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          war/pom.xml
          http://jenkins-ci.org/commit/jenkins/5a28f6a7f7b3debf3ea2ac459603af0d8086395e
          Log:
          [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix.
          (cherry picked from commit 8eb264ceb1bfdea332009499a06b9faa89dc03e9)

          Conflicts:

          changelog.html

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html war/pom.xml http://jenkins-ci.org/commit/jenkins/5a28f6a7f7b3debf3ea2ac459603af0d8086395e Log: [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix. (cherry picked from commit 8eb264ceb1bfdea332009499a06b9faa89dc03e9) Conflicts: changelog.html

            People

            • Assignee:
              Unassigned
              Reporter:
              santtu Santeri Paavolainen
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: