Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44634

JNLP Rejected Agent Connections Leak Socket Handles

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: windows-slaves-plugin
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      If a JNLP agent repeatedly tries to connect while there is a valid connection already established, Jenkins will open a socket but not close it after rejecting the connection. This pattern can continue to repeat until the Jenkins process hits an open file limit imposed by the operating system, leading to the disconnection of all agents.

      These can be seen with:

      $ lsof -u $JENKINS_USER
      ...
      java 29334 jenkins-ci 12u sock 0,6 0t0 284498333 can't identify protocol
      

      This issue can be resolved by restarting Jenkins, the OS networking subsystem, or possible by connecting gdb and manually closing open sockets.

        Attachments

          Activity

          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          The issue is valid. I need to double-check if there are things to fix in the current baseline. In 2.46.1 we had JENKINS-42371, but obviously it was not enough.

          Just in case, could you please provide output of the File Leak Detector (http://file-leak-detector.kohsuke.org/ )?  

          Show
          oleg_nenashev Oleg Nenashev added a comment - The issue is valid. I need to double-check if there are things to fix in the current baseline. In 2.46.1 we had  JENKINS-42371 , but obviously it was not enough. Just in case, could you please provide output of the File Leak Detector ( http://file-leak-detector.kohsuke.org/  )?  
          Hide
          oleg_nenashev Oleg Nenashev added a comment -
          Show
          oleg_nenashev Oleg Nenashev added a comment - Trevor Bramwell ping
          Hide
          dave_pierce Dave Pierce added a comment - - edited

          Hi!

          We are experiencing very similar (identical symptoms, maybe not exactly the same root cause) with Jenkins 2.85, Ubuntu 14.04.5 and OpenJDK 1.8.0_141.

          Am doing more research...

          File Leak Detector is unable to connect to the Jenkins JVM.

          Show
          dave_pierce Dave Pierce added a comment - - edited Hi! We are experiencing very similar (identical symptoms, maybe not exactly the same root cause) with Jenkins 2.85, Ubuntu 14.04.5 and OpenJDK 1.8.0_141. Am doing more research... File Leak Detector is unable to connect to the Jenkins JVM.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          You need to run it as Java agent

          Show
          oleg_nenashev Oleg Nenashev added a comment - You need to run it as Java agent
          Hide
          dave_pierce Dave Pierce added a comment - - edited

          Thanks; I will have to try that after hours. (It's production.)

          Is the file leak detector plugin likely to be of any use? https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin

          Show
          dave_pierce Dave Pierce added a comment - - edited Thanks; I will have to try that after hours. (It's production.) Is the file leak detector plugin likely to be of any use? https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin
          Hide
          dave_pierce Dave Pierce added a comment -

          I ran it. I get a lot of these:

           

          #2 socket channel by thread:Computer.threadPoolForRemoting 356 on Wed Dec 13 15:08:38 CST 2017
              at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:108)
              at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
              at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
              at java.nio.channels.SocketChannel.open(SocketChannel.java:187)
              at org.jinterop.dcom.transport.JIComTransport.attach(JIComTransport.java:98)
              at rpc.Stub.attach(Stub.java:104)
              at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:860)
              at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:825)
              at org.jinterop.dcom.core.JIComServer.addRef_ReleaseRef(JIComServer.java:909)
              at org.jinterop.dcom.core.JISession.releaseRef(JISession.java:730)
              at org.jinterop.dcom.core.JIComServer.createInstance(JIComServer.java:746)
              at org.jvnet.hudson.wmi.WMI.connect(WMI.java:61)
              at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:208)
              at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
              at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)

          Show
          dave_pierce Dave Pierce added a comment - I ran it. I get a lot of these:   #2 socket channel by thread:Computer.threadPoolForRemoting 356 on Wed Dec 13 15:08:38 CST 2017     at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:108)     at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)     at java.nio.channels.SocketChannel.open(SocketChannel.java:145)     at java.nio.channels.SocketChannel.open(SocketChannel.java:187)     at org.jinterop.dcom.transport.JIComTransport.attach(JIComTransport.java:98)     at rpc.Stub.attach(Stub.java:104)     at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:860)     at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:825)     at org.jinterop.dcom.core.JIComServer.addRef_ReleaseRef(JIComServer.java:909)     at org.jinterop.dcom.core.JISession.releaseRef(JISession.java:730)     at org.jinterop.dcom.core.JIComServer.createInstance(JIComServer.java:746)     at org.jvnet.hudson.wmi.WMI.connect(WMI.java:61)     at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:208)     at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)     at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)     at java.lang.Thread.run(Thread.java:748)
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Code in Windows Slaves plugin is definitely prone to the connection leaks during launch() and afterDisconnect(). Not sure this is the originally reported defect, but Trevor Bramwell also has the Windows Slaves plugin installed.

          Let's try to fix this defect in the ticket

          Show
          oleg_nenashev Oleg Nenashev added a comment - Code in Windows Slaves plugin is definitely prone to the connection leaks during launch() and afterDisconnect(). Not sure this is the originally reported defect, but Trevor Bramwell also has the Windows Slaves plugin installed. Let's try to fix this defect in the ticket

            People

            • Assignee:
              Unassigned
              Reporter:
              bramwelt Trevor Bramwell
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: