Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33412

Jenkins locks when started in HTTPS mode on a host with 37+ processors

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: winstone-jetty
    • Environment:
      Jenkins 1.652
      org.jenkins-ci:winstone 2.9
      Testing in the Linux JDK 1.7 and 1.8 as well as the Solaris JDK 1.7 1.8 (both OpenJDK and OracleJDK). Reproduces in Ubuntu, Debian, CentOS and SmartOS.
    • Similar Issues:

      Description

      Summary
      Using Winstone 2.9 (i.e. the embedded Jetty wrapper) or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

      Reproduction
      The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

      qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
      

      Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

      Why is this important?
      You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

      Solution
      I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

      Lastly, I tried the exact same reproduction steps with winstone-3.1. Luckily, with the upgrade to embedded Jetty in the 3.1 version the issue is resolved.

      Can we upgrade the next Jenkins release to use the winstone-3.1 component?

      This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                olamy Olivier Lamy
                Reporter:
                elijah Elijah Zupancic
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: