Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-4618

Read timeout during "groovysh" session with Hudson CLI leads to heap exhaustion

XMLWordPrintable

    • Icon: Patch Patch
    • Resolution: Done
    • Icon: Major Major
    • core
    • None
    • Platform: All, OS: All

      After upgrading recently to a version of Hudson that includes the Hudson CLI, we
      started seeing catastrophic heap exhaustion when the "groovysh" command was
      used. Our Hudson servers are configured to use only the HTTP-based communication
      method for the CLI, not the ad hoc TCP port method.

      Digging around a bit, it turns out to be related to read timeouts on the Channel
      reader. In our case, the HTTP connection timeout was set to 20 seconds, so after
      that time we would see the CLI get disconnected from Hudson. The server logs
      show an exception like this:

      2009-10-02 12:50:59,786 ERROR [hudson.remoting.Channel] I/O error in channel
      HTTP full-duplex channel 552692ad-f7ab-4ed2-8eb6-5be614fdcefb
      java.net.SocketTimeoutException: Read timed out
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(SocketInputStream.java:129)
      at
      org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:746)
      at
      org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:776)
      at
      org.apache.coyote.http11.filters.ChunkedInputFilter.readBytes(ChunkedInputFilter.java:243)
      at
      org.apache.coyote.http11.filters.ChunkedInputFilter.parseChunkHeader(ChunkedInputFilter.java:273)
      at
      org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:132)
      at
      org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:705)
      at org.apache.coyote.Request.doRead(Request.java:428)
      at
      org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304)
      at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:372)
      at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:317)
      at
      org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:105)
      at
      java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
      at
      java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
      at
      java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:802)

      This happens in the Channel reader thread, but in the pool thread that is
      serving the HTTP request, the Groovysh repl loop continues, continuing to look
      for new commands from the user. Each time it does this, it ends up getting down
      to Channel.send where an exception is thrown to indicate that the channel is closed.

      In the case of writes/flush, this exception is swallowed by the PrintStream and
      PrintWriter wrappers at various levels of the output chain (because they swallow
      ALL IOExceptions, per their contract). In the case of reads, the IOException
      ends up being throw up to Groovy's ShellRunner.work where it is caught as a
      Throwable and passed to the configured errorHook.

      The problem is that the errorHook then tries to write the error message out to
      stderr. This fails, but because of the behavior of PrintStream and PrintWriter,
      the IOException from Channel.send doesn't get thrown. This allows the loop in
      ShellRunner.work to continue, calling readLine again to repeat the cycle.

      While this is happening, each attempt to read from the ProxyInputStream creates
      a Chunk request that gets put into Channel.pendingCalls. Since these requests
      will never be serviced they just sit there, chewing up the heap until it is all
      gone.

      I have a patch that fixes this, inasmuch as it can be fixed. Instead of relying
      on the default errorHook, it uses a custom errorHook that watches for
      ChannelClosedException. If it sees it, it rethrows it, otherwise it calls the
      default error hook. This causes the exception to propagate out of
      ShellRunner.work and eventually causes the GroovyshCommand to terminate with a
      failure.

      Once I fixed that, however, I noticed that FullDuplexHttpChannel and
      ProxyInputStream$Chunk instances were still being left on the heap. I tracked
      that down to a shutdown hook being added in Groovy's RecordCommand. Since that
      hook object has a reference to the Groovysh instance, it transitively holds onto
      the Channel via IO.in, IO.out, and IO.err.

      My fix for this second problem may be more controversial; I decided to remove
      the "load", "save", and "record" commands. Besides the shutdown hook, the reason
      for this is that these three commands are meant to interact with local files,
      but in the case of the Hudson/Groovy CLI these files will actually be looked for
      on the server, not the client. I could see having usable variants of "load",
      "save", and "record" in the Hudson/Groovy CLI, but I don't think the stock ones
      fit the bill.

      And last but not least, my patch adds a predefined binding for the "hudson"
      variable to hudson.model.Hudson.getInstance() and adds a predefined import for
      "hudson.model.*".

      I know this potentially three separate issues, so please let me know if I need
      to split up the patch.

            Unassigned Unassigned
            md5 Mike Dillon
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: