Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71259

Agents don't appear to send TCP keepalive

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core

      Our team provides Jenkins as a managed service to internal teams. We have previously been running the following setup:

      • Single VM per controller, with the controller in a container
      • HTTPS reverse proxy
      • JNLP connecting directly to the controller VM using "tunnel connection through" as the HTTPS reverse proxy didn't serve JNLP connections

      We recently switched to running all of our Jenkins controller on Kubernetes, and we share a common Azure load balancer for them all. This load balancer listens for HTTP connections and performs host-based-routing, and also listens on a unique JNLP port for each controller instance.

      We've had CI teams report issues concerning node disconnections. We looked into the issue and noticed that it happened on nodes that run long-running processes where nothing is printed to stdout for long periods of time. We were able to solve it for many users by increasing the TCP idle connection timeout setting on the load balancer from 4 to 30 minutes, but we still have builds that run longer than this without any output. Now the issue is pointing to a faulty (or lack of) TCP keepalive functionality in the agent.

      I would expect the agent to send TCP keepalive packets to the controller every n seconds (configurable) in order to assure the load balancer that the connection is active.

      The errors we see in the controller console log look like this:

       

      03:19:33  Cannot contact NODE_NAME: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@3095990c:JNLP4-connect connection from NODE_IP:54165": Remote call on JNLP4-connect connection from NODE_IP:54165 failed. The channel is closing down or has closed down 

       

       

            Unassigned Unassigned
            brovoca Emil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: