Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33412

Jenkins locks when started in HTTPS mode on a host with 37+ processors

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: winstone-jetty
    • Environment:
      Jenkins 1.652
      org.jenkins-ci:winstone 2.9
      Testing in the Linux JDK 1.7 and 1.8 as well as the Solaris JDK 1.7 1.8 (both OpenJDK and OracleJDK). Reproduces in Ubuntu, Debian, CentOS and SmartOS.
    • Similar Issues:

      Description

      Summary
      Using Winstone 2.9 (i.e. the embedded Jetty wrapper) or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

      Reproduction
      The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

      qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
      

      Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

      Why is this important?
      You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

      Solution
      I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

      Lastly, I tried the exact same reproduction steps with winstone-3.1. Luckily, with the upgrade to embedded Jetty in the 3.1 version the issue is resolved.

      Can we upgrade the next Jenkins release to use the winstone-3.1 component?

      This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.

        Attachments

          Issue Links

            Activity

            elijah Elijah Zupancic created issue -
            elijah Elijah Zupancic made changes -
            Field Original Value New Value
            Description *Summary*
            Using Winstone Jetty 3.1 or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            _*Can we upgrade the next Jenkins release to use the winstone-3.2 component?*_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            *Summary*
            Using Winstone Jetty 3.1 or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            _Can we upgrade the next Jenkins release to use the winstone-3.2 component?_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            elijah Elijah Zupancic made changes -
            Description *Summary*
            Using Winstone Jetty 3.1 or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            _Can we upgrade the next Jenkins release to use the winstone-3.2 component?_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            *Summary*
            Using Winstone 3.1 (i.e. the embedded Jetty wrapper) or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            _Can we upgrade the next Jenkins release to use the winstone-3.2 component?_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            Hide
            danielbeck Daniel Beck added a comment -

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            It's not clear to me what you did. There are no open PRs, and master… well… https://github.com/jenkinsci/winstone/compare/winstone-3.1...master

            Show
            danielbeck Daniel Beck added a comment - Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved. It's not clear to me what you did. There are no open PRs, and master… well… https://github.com/jenkinsci/winstone/compare/winstone-3.1...master
            Hide
            ydubreuil Yoann Dubreuil added a comment -

            With that many CPUs, glibc can do crazy memory allocations, like reported here: https://issues.apache.org/jira/browse/HADOOP-7154

            I wonder if switching to latest jetty is working by luck, as memory arena creation depends on thread contention. Do you have a /proc/PID/status content of the hanging JVM?

            Show
            ydubreuil Yoann Dubreuil added a comment - With that many CPUs, glibc can do crazy memory allocations, like reported here: https://issues.apache.org/jira/browse/HADOOP-7154 I wonder if switching to latest jetty is working by luck, as memory arena creation depends on thread contention. Do you have a /proc/PID/status content of the hanging JVM?
            elijah Elijah Zupancic made changes -
            Description *Summary*
            Using Winstone 3.1 (i.e. the embedded Jetty wrapper) or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.2-SNAPSHOT. Luckily, with the upgrade to embedded Jetty in the 3.2 version the issue is resolved.

            _Can we upgrade the next Jenkins release to use the winstone-3.2 component?_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            *Summary*
            Using Winstone 2.9 (i.e. the embedded Jetty wrapper) or below it will not run in HTTPS mode on hosts with 37 cores/processors or more. This problem replicates regardless of the JDK or the operating system.

            *Reproduction*
            The easiest way to reproduce the error is to use qemu to virtualize a 37 core system. You can do that with the -smp <cores> parameter. For example, for testing I run:

            {code:none}
            qemu-system-x86_64 -hda ubuntu.img -m 4096 -smp 48
            {code}

            Once you have a VM with more than 37 cores setup, install Jenkins 1.652 and configure it to use HTTPS. Attempt to start it and connect to either the HTTP or the HTTPS port. The connection will time out for either port with the server effectively locked until you send it a SIGTERM. Please refer to the attached log file to see its start process.

            *Why is this important?*
            You may ask - who is running Jenkins on that big of a server? Well, with containerization technologies (e.g. Docker) taking a center stage, we are seeing more and more deployments where there is no VM involved and hence a container gets a slice of CPU but has visibility to all of the processors on a system. The official Docker image of Jenkins suffers from this defect. Sure a user can set up their own reverse proxy to run TLS through, but it adds unneeded complexity for users looking to containerize their Jenkins environment.

            *Solution*
            I've done the work of reducing the surface area of root cause analysis. I removed the entire jenkins war from the jenkins winstone runner (https://github.com/jenkinsci/winstone) and ran a simple hello world war instead. With HTTPS enabled, the issue still reproduced. It took a lot of fiddling to determine that it was exactly at 37 cores in which the hang occurred.

            Lastly, I tried the exact same reproduction steps with winstone-3.1. Luckily, with the upgrade to embedded Jetty in the 3.1 version the issue is resolved.

            _Can we upgrade the next Jenkins release to use the winstone-3.1 component?_

            This would be the easiest and the best fix. I would be happy to contribute to any efforts that would allow for us to get this into a release.
            elijah Elijah Zupancic made changes -
            Environment Jenkins 1.652
            org.jenkins-ci:winstone 2.9, 3.0, 3.1
            Testing in the Linux JDK 1.7 and 1.8 as well as the Solaris JDK 1.7 1.8 (both OpenJDK and OracleJDK). Reproduces in Ubuntu, Debian, CentOS and SmartOS.
            Jenkins 1.652
            org.jenkins-ci:winstone 2.9
            Testing in the Linux JDK 1.7 and 1.8 as well as the Solaris JDK 1.7 1.8 (both OpenJDK and OracleJDK). Reproduces in Ubuntu, Debian, CentOS and SmartOS.
            Hide
            elijah Elijah Zupancic added a comment - - edited

            Daniel Beck You are completely correct. I did the bulk of my testing with winstone-2.9 and I lightly tested with winstone-3.1. I made a mistake with my network setup. I just validated that winstone-3.1 also works correctly and updated the bug to that effect.

            I would still recommend upgrading the next version of Jenkins to winstone-3.1 in order to fix this bug because this seems like one of those messy things that if you have a fix by upgrading a core library - it is just better doing it that way.

            That said, read below if we want to go down the root cause analysis route:

            I work at Joyent and if any developers want environment in which this can be reproduced, please email me your public key (elijah.zupancic@joyent.com) and I will create an instance.

            Yoann Dubreuil The behavior is present across different operating systems including one's that do not use glibc as part of their JVM implementation (e.g. SmartOS). When I was inspecting the application with a debugger, the best that I could tell was that even though a request had come in, EPOLL would not wake and a processing thread would not be dispatched. We see that winstone recieves the request (when FINE logging is enabled):

            FINE: created SCEP@3c108ef8{l(/97.113.3.1:54175)<->r(/165.225.168.215:8443),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{SslConnection@56a764e3 SSL NOT_HANDSHAKING i/o/u=-1/-1/-1 ishut=false oshut=false {AsyncHttpConnection@23fe1e75,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}}
            

            However, the process just continues to sleep after this.

            Here's the output from /proc/PID/status when it is in a hung state:

            Name:   java
            State:  S (sleeping)
            Tgid:   1673
            Ngid:   0
            Pid:    1673
            PPid:   1672
            TracerPid:      0
            Uid:    1000    1000    1000    1000
            Gid:    1000    1000    1000    1000
            FDSize: 256
            Groups: 4 24 27 30 46 110 111 1000 
            NStgid: 1673
            NSpid:  1673
            NSpgid: 1672
            NSsid:  1551
            VmPeak:  8983212 kB
            VmSize:  8983208 kB
            VmLck:         0 kB
            VmPin:         0 kB
            VmHWM:    199776 kB
            VmRSS:    199276 kB
            VmData:  8922448 kB
            VmStk:       136 kB
            VmExe:         4 kB
            VmLib:     17216 kB
            VmPTE:      1080 kB
            VmPMD:        48 kB
            VmSwap:        0 kB
            Threads:        98
            SigQ:   0/15699
            SigPnd: 0000000000000000
            ShdPnd: 0000000000000000
            NSsid:  1551
            VmPeak:  8983212 kB
            VmSize:  8983208 kB
            VmLck:         0 kB
            VmPin:         0 kB
            VmHWM:    199776 kB
            VmRSS:    199276 kB
            VmData:  8922448 kB
            VmStk:       136 kB
            VmExe:         4 kB
            VmLib:     17216 kB
            VmPTE:      1080 kB
            VmPMD:        48 kB
            VmSwap:        0 kB
            Threads:        98
            SigQ:   0/15699
            SigPnd: 0000000000000000
            ShdPnd: 0000000000000000
            SigBlk: 0000000000000000
            SigIgn: 0000000000000000
            SigCgt: 2000000181005ccf
            CapInh: 0000000000000000
            CapPrm: 0000000000000000
            CapEff: 0000000000000000
            CapBnd: 0000003fffffffff
            Seccomp:        0
            Cpus_allowed:   ffff,ffffffff
            Cpus_allowed_list:      0-47
            Mems_allowed:   00000000,00000001
            Mems_allowed_list:      0
            voluntary_ctxt_switches:        3
            nonvoluntary_ctxt_switches:     1
            
            Show
            elijah Elijah Zupancic added a comment - - edited Daniel Beck You are completely correct. I did the bulk of my testing with winstone-2.9 and I lightly tested with winstone-3.1. I made a mistake with my network setup. I just validated that winstone-3.1 also works correctly and updated the bug to that effect. I would still recommend upgrading the next version of Jenkins to winstone-3.1 in order to fix this bug because this seems like one of those messy things that if you have a fix by upgrading a core library - it is just better doing it that way. That said, read below if we want to go down the root cause analysis route: I work at Joyent and if any developers want environment in which this can be reproduced, please email me your public key (elijah.zupancic@joyent.com) and I will create an instance. Yoann Dubreuil The behavior is present across different operating systems including one's that do not use glibc as part of their JVM implementation (e.g. SmartOS). When I was inspecting the application with a debugger, the best that I could tell was that even though a request had come in, EPOLL would not wake and a processing thread would not be dispatched. We see that winstone recieves the request (when FINE logging is enabled): FINE: created SCEP@3c108ef8{l(/97.113.3.1:54175)<->r(/165.225.168.215:8443),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{SslConnection@56a764e3 SSL NOT_HANDSHAKING i/o/u=-1/-1/-1 ishut=false oshut=false {AsyncHttpConnection@23fe1e75,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}} However, the process just continues to sleep after this. Here's the output from /proc/PID/status when it is in a hung state: Name: java State: S (sleeping) Tgid: 1673 Ngid: 0 Pid: 1673 PPid: 1672 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 256 Groups: 4 24 27 30 46 110 111 1000 NStgid: 1673 NSpid: 1673 NSpgid: 1672 NSsid: 1551 VmPeak: 8983212 kB VmSize: 8983208 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 199776 kB VmRSS: 199276 kB VmData: 8922448 kB VmStk: 136 kB VmExe: 4 kB VmLib: 17216 kB VmPTE: 1080 kB VmPMD: 48 kB VmSwap: 0 kB Threads: 98 SigQ: 0/15699 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 NSsid: 1551 VmPeak: 8983212 kB VmSize: 8983208 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 199776 kB VmRSS: 199276 kB VmData: 8922448 kB VmStk: 136 kB VmExe: 4 kB VmLib: 17216 kB VmPTE: 1080 kB VmPMD: 48 kB VmSwap: 0 kB Threads: 98 SigQ: 0/15699 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 2000000181005ccf CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff Seccomp: 0 Cpus_allowed: ffff,ffffffff Cpus_allowed_list: 0-47 Mems_allowed: 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 3 nonvoluntary_ctxt_switches: 1
            Hide
            ydubreuil Yoann Dubreuil added a comment -

            Thanks for the attachments, it's pretty clear that glibc is not to blame here, with only 200Mb of RSS for the process.

            DropWizard team hit the same issue, which was fixed in Jetty after their report.

            Daniel Beck I think we should consider upgrading Jetty to 9.2.2.v20140723 to get this fix

            Show
            ydubreuil Yoann Dubreuil added a comment - Thanks for the attachments, it's pretty clear that glibc is not to blame here, with only 200Mb of RSS for the process. DropWizard team hit the same issue, which was fixed in Jetty after their report. Daniel Beck I think we should consider upgrading Jetty to 9.2.2.v20140723 to get this fix
            Hide
            elijah Elijah Zupancic added a comment -

            Thanks for your prompt action.

            That fix looks like it will solve the problem but it makes me sad. It is doing math to determine the thread count by computing the number of cores available. In the container world, we are seeing things like CPU shares or fair-share scheduling algorithms used to dice up CPU while exposing all of the cores to the OS. This leads to a bunch of weird performance problems especially if CPU to thread affinity is set. This way of configuring applications is not going to be sustainable long-term based on where OS containerization is heading.

            I'll add a feature request after this issue is fixed to allow for the manual configuration of the number of threads for acceptors, selectors, etc. Once again - thanks for your work.

            Show
            elijah Elijah Zupancic added a comment - Thanks for your prompt action. That fix looks like it will solve the problem but it makes me sad. It is doing math to determine the thread count by computing the number of cores available. In the container world, we are seeing things like CPU shares or fair-share scheduling algorithms used to dice up CPU while exposing all of the cores to the OS. This leads to a bunch of weird performance problems especially if CPU to thread affinity is set. This way of configuring applications is not going to be sustainable long-term based on where OS containerization is heading. I'll add a feature request after this issue is fixed to allow for the manual configuration of the number of threads for acceptors, selectors, etc. Once again - thanks for your work.
            Show
            danielbeck Daniel Beck added a comment - Yoann Dubreuil Winstone 3.x is on Jetty 9.2.15: https://github.com/jenkinsci/winstone/blob/d016188767386d8b9f64e728b6a98e39cab695a8/pom.xml#L250 So https://github.com/jenkinsci/jenkins/pull/2108 should do the trick.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 169328 ] JNJira + In-Review [ 183450 ]
            Hide
            rodrigc Craig Rodrigues added a comment -

            Daniel Beck This problem has been reproduced and confirmed by multiple people here:

            https://github.com/jenkinsci/docker/issues/702

            Show
            rodrigc Craig Rodrigues added a comment - Daniel Beck This problem has been reproduced and confirmed by multiple people here: https://github.com/jenkinsci/docker/issues/702
            Hide
            skeenan Shaun Keenan added a comment - - edited

            I'm seeing this happen again on the latest Jenkins (well, every LTS above 2.107.3, as well as latest), which are running winstone 4

            The issue that I'm seeing is that Jenkins fails to completely initialize the HTTP listener (HTTPS isn't enabled - k8s ingress takes care of HTTPS, so no need) whenever it's running on a host with >36 CPU's.  If I run Jenkins on a host with 36 or fewer CPU's, it starts up just fine.  I took the same kubernetes pod (tried with both latest LTS and latest non-LTS), pinned it to a 72-CPU node and it initialized, said everything was good, but never responded to HTTP requests, re-pinned it to a 20-CPU node, and it started correctly.

            Nothing in the logs indicates that there's an issue at all.  The same thing happens both with a completely fresh jenkins_home and one that has been in use for years and has a bunch of plugins installed.

            For me, rolling back to 2.107.3 makes everything work again.  Moving to any newer LTS version breaks everything.

            This is a rather frustrating issue.  I need to update to pick up a bunch of vulnerability fixes.  At least I can pin to nodes with fewer CPU's, but this isn't exactly the best workaround.

            Show
            skeenan Shaun Keenan added a comment - - edited I'm seeing this happen again on the latest Jenkins (well, every LTS above 2.107.3, as well as latest), which are running winstone 4 The issue that I'm seeing is that Jenkins fails to completely initialize the HTTP listener (HTTPS isn't enabled - k8s ingress takes care of HTTPS, so no need) whenever it's running on a host with >36 CPU's.  If I run Jenkins on a host with 36 or fewer CPU's, it starts up just fine.  I took the same kubernetes pod (tried with both latest LTS and latest non-LTS), pinned it to a 72-CPU node and it initialized, said everything was good, but never responded to HTTP requests, re-pinned it to a 20-CPU node, and it started correctly. Nothing in the logs indicates that there's an issue at all.  The same thing happens both with a completely fresh jenkins_home and one that has been in use for years and has a bunch of plugins installed. For me, rolling back to 2.107.3 makes everything work again.  Moving to any newer LTS version breaks everything. This is a rather frustrating issue.  I need to update to pick up a bunch of vulnerability fixes.  At least I can pin to nodes with fewer CPU's, but this isn't exactly the best workaround.
            Hide
            mikescholze Mike Scholze added a comment - - edited

            Same here. Our docker hosts have 80 cores, with version 2.117 we could not reach jenkins via http, 2.116 works fine. On my local test system with 4 cores everything works fine.

            Maybe thats the reason:
            --> Update Winstone from 4.1.2 to 4.2 to update Jetty from 9.4.5 to 9.4.8 for various bugfixes and improvements. (full changelogJetty 9.4.6 changelogJetty 9.4.7 changelogJetty 9.4.8 changelog)

            Show
            mikescholze Mike Scholze added a comment - - edited Same here. Our docker hosts have 80 cores, with version 2.117 we could not reach jenkins via http, 2.116 works fine. On my local test system with 4 cores everything works fine. Maybe thats the reason: --> Update Winstone from 4.1.2 to 4.2 to update Jetty from 9.4.5 to 9.4.8 for various bugfixes and improvements. ( full changelog ,  Jetty 9.4.6 changelog ,  Jetty 9.4.7 changelog ,  Jetty 9.4.8 changelog )
            thomaswerner Thomas Werner made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            Hide
            rodrigc Craig Rodrigues added a comment -

            @olamy do you know if your change https://github.com/jenkinsci/winstone/pull/44 has impacts on this ticket?

            Show
            rodrigc Craig Rodrigues added a comment - @olamy do you know if your change https://github.com/jenkinsci/winstone/pull/44 has impacts on this ticket?
            Hide
            olamy Olivier Lamy added a comment -

            is it possible to get a thread dump?

            Show
            olamy Olivier Lamy added a comment - is it possible to get a thread dump?
            Hide
            olamy Olivier Lamy added a comment -

            do you have same issue with 2.128? (this one include a new jetty version)

            Show
            olamy Olivier Lamy added a comment - do you have same issue with 2.128? (this one include a new jetty version)
            Hide
            mikescholze Mike Scholze added a comment -

            same issue with 2.128 and 2.138

            Show
            mikescholze Mike Scholze added a comment - same issue with 2.128 and 2.138
            Hide
            olamy Olivier Lamy added a comment -

            please provide jetty debug log and thread dump.

            Show
            olamy Olivier Lamy added a comment - please provide jetty debug log and thread dump.
            Hide
            olamy Olivier Lamy added a comment -

            Mike Scholze I definitely need some logs from you otherwise I cannot do much..

            JVM version? OS?

            Threadump.

            Jetty debug logs

            Create a file called jul.properties with the following content:

            handlers=java.util.logging.ConsoleHandler
            .level=INFO
            org.eclipse.jetty.level=ALL
            java.util.logging.ConsoleHandler.level=FINEST
             

            Then start jenkins with the sys prop:

            -Djava.util.logging.config.file=jul.properties 
            Show
            olamy Olivier Lamy added a comment - Mike Scholze I definitely need some logs from you otherwise I cannot do much.. JVM version? OS? Threadump. Jetty debug logs Create a file called jul.properties with the following content: handlers=java.util.logging.ConsoleHandler .level=INFO org.eclipse.jetty.level=ALL java.util.logging.ConsoleHandler.level=FINEST Then start jenkins with the sys prop: -Djava.util.logging.config.file=jul.properties
            Hide
            olamy Olivier Lamy added a comment - - edited

            Mike Scholze are those logs generated with 2.128 or 2.138?

            because I can read

            Running from: /data/apps/jenkins/app/jenkins-2.121.3.war 
            Show
            olamy Olivier Lamy added a comment - - edited Mike Scholze are those logs generated with 2.128 or 2.138? because I can read Running from: /data/apps/jenkins/app/jenkins-2.121.3.war
            mikescholze Mike Scholze made changes -
            Hide
            mikescholze Mike Scholze added a comment - - edited

            Olivier Lamy these are not my log files.

            But now my test scenario:

            Host system with 80 cores, ~500G RAM
            Ubuntu 16.04.4 LTS
            Docker 17.09.0-ce

            Test-Docker-Image based on official docker jenkins image + jul.properties file:

            // jul.properties
            
            handlers=java.util.logging.ConsoleHandler
            .level=INFO
            org.eclipse.jetty.level=ALL
            java.util.logging.ConsoleHandler.level=FINEST
            
            
            // Dockerfile
            
            FROM jenkins/jenkins:2.121.2
            ENV JAVA_OPTS="${JAVA_OPTS} -Djava.util.logging.config.file=/usr/share/jenkins/jul.properties"
            COPY jul.properties /usr/share/jenkins/
            ENTRYPOINT ["/sbin/tini", "--", "/usr/local/bin/jenkins.sh"]

            Logfile: jenkins_001_mikescholze_2.121.2.log

            Show
            mikescholze Mike Scholze added a comment - - edited Olivier Lamy these are not my log files. But now my test scenario: Host system with 80 cores, ~500G RAM Ubuntu 16.04.4 LTS Docker 17.09.0-ce Test-Docker-Image based on official docker jenkins image + jul.properties file: // jul.properties handlers=java.util.logging.ConsoleHandler .level=INFO org.eclipse.jetty.level=ALL java.util.logging.ConsoleHandler.level=FINEST // Dockerfile FROM jenkins/jenkins:2.121.2 ENV JAVA_OPTS="${JAVA_OPTS} -Djava.util.logging.config.file=/usr/share/jenkins/jul.properties" COPY jul.properties /usr/share/jenkins/ ENTRYPOINT ["/sbin/tini", "--", "/usr/local/bin/jenkins.sh"] Logfile: jenkins_001_mikescholze_2.121.2.log
            Hide
            olamy Olivier Lamy added a comment -

            jstack?

            what happen with env var or sys prop JETTY_AVAILABLE_PROCESSORS=32 (or different value)?

            Show
            olamy Olivier Lamy added a comment - jstack? what happen with env var or sys prop JETTY_AVAILABLE_PROCESSORS=32 (or different value)?
            Hide
            mikescholze Mike Scholze added a comment -

            No change with that env var.

            Show
            mikescholze Mike Scholze added a comment - No change with that env var.
            Hide
            olamy Olivier Lamy added a comment - - edited

            pr here https://github.com/jenkinsci/winstone/pull/54

            I'd like to test that but you need to rebuild winstone and jenkins as well.

            I can do it for you but not sure how to share with you?

             

            Show
            olamy Olivier Lamy added a comment - - edited pr here https://github.com/jenkinsci/winstone/pull/54 I'd like to test that but you need to rebuild winstone and jenkins as well. I can do it for you but not sure how to share with you?  
            Hide
            olamy Olivier Lamy added a comment -

             Mike Scholze or anyone having the issue can you please the war here  http://home.apache.org/~olamy/jenkins/

            starting with the option 

            --useQTP

            use --help for other new jetty options

            Show
            olamy Olivier Lamy added a comment -   Mike Scholze  or anyone having the issue can you please the war here  http://home.apache.org/~olamy/jenkins/ starting with the option  --useQTP use --help for other new jetty options
            olamy Olivier Lamy made changes -
            Assignee Olivier Lamy [ olamy ]
            olamy Olivier Lamy made changes -
            Labels jenkins winstone jenkins jetty winstone
            Hide
            mikescholze Mike Scholze added a comment - - edited

            Yes, that worked. I have access to jenkins ui now and the jnlp slaves connected successfully.

            Tested with the following Dockerfile. I have overwritten the jenkins.war with yours (2.139-SNAPSHOT) and added the parameter.

            // Dockerfile
            
            FROM jenkins/jenkins:2.121.2
            
            COPY jenkins_winstone_pr_54.war /usr/share/jenkins/jenkins.war
            
            ENTRYPOINT ["/sbin/tini", "--", "/usr/local/bin/jenkins.sh", "--useQTP"]
            
            
            Show
            mikescholze Mike Scholze added a comment - - edited Yes, that worked. I have access to jenkins ui now and the jnlp slaves connected successfully. Tested with the following Dockerfile. I have overwritten the jenkins.war with yours (2.139-SNAPSHOT) and added the parameter. // Dockerfile FROM jenkins/jenkins:2.121.2 COPY jenkins_winstone_pr_54.war /usr/share/jenkins/jenkins.war ENTRYPOINT [ "/sbin/tini" , "--" , "/usr/local/bin/jenkins.sh" , "--useQTP" ]
            olamy Olivier Lamy made changes -
            Link This issue is related to JENKINS-53239 [ JENKINS-53239 ]
            Hide
            olamy Olivier Lamy added a comment - - edited

            Mike Scholze thanks that's great. I just uploaded a new version of the war. You don't need anymore to use the option --useQTP

            As the idea is to have as default now.

            Show
            olamy Olivier Lamy added a comment - - edited Mike Scholze thanks that's great. I just uploaded a new version of the war. You don't need anymore to use the option --useQTP As the idea is to have as default now.
            Hide
            olamy Olivier Lamy added a comment -

            Good to hear feel free to vote/comment for JENKINS-53239 or https://github.com/jenkinsci/winstone/pull/54

            Show
            olamy Olivier Lamy added a comment - Good to hear feel free to vote/comment for  JENKINS-53239 or  https://github.com/jenkinsci/winstone/pull/54
            csanchez Carlos Sanchez made changes -
            Remote Link This issue links to "jenkins-docker#702 (Web Link)" [ 21409 ]
            Hide
            skeenan Shaun Keenan added a comment - - edited

            I just tested this on a 72-core k8s node and it worked perfectly.  Published a docker image built off of this at

            skeenan947/jenkinstest
            Show
            skeenan Shaun Keenan added a comment - - edited I just tested this on a 72-core k8s node and it worked perfectly.  Published a docker image built off of this at skeenan947/jenkinstest
            Hide
            skeenan Shaun Keenan added a comment -

            Would love to know when we can expect to see this change in LTS - until this is in, I'll be stuck on 2.107.3, which has quite a few vulnerabilities we'd like to mitigate.

            The nodes this runs on are 72-core amd64 kubernetes nodes.

            FWIW, here's an example node:

             

            Capacity:
             cpu: 72
             memory: 528332604Ki
             pods: 110
            System Info:
             Machine ID: 833e0926ee21aed71ec075d726cbcfe0
             System UUID: 00000000-0000-0000-0000-0CC47AC64A64
             Boot ID: 786d3795-f026-4556-9047-923f95a9a331
             Kernel Version: 4.14.44-coreos-r1
             OS Image: Container Linux by CoreOS 1745.5.0 (Rhyolite)
             Operating System: linux
             Architecture: amd64
             Container Runtime Version: docker://18.3.1
             Kubelet Version: v1.8.10+coreos.0
             Kube-Proxy Version: v1.8.10+coreos.0
            

             

            Show
            skeenan Shaun Keenan added a comment - Would love to know when we can expect to see this change in LTS - until this is in, I'll be stuck on 2.107.3, which has quite a few vulnerabilities we'd like to mitigate. The nodes this runs on are 72-core amd64 kubernetes nodes. FWIW, here's an example node:   Capacity: cpu: 72 memory: 528332604Ki pods: 110 System Info: Machine ID: 833e0926ee21aed71ec075d726cbcfe0 System UUID: 00000000-0000-0000-0000-0CC47AC64A64 Boot ID: 786d3795-f026-4556-9047-923f95a9a331 Kernel Version: 4.14.44-coreos-r1 OS Image: Container Linux by CoreOS 1745.5.0 (Rhyolite) Operating System : linux Architecture: amd64 Container Runtime Version: docker: //18.3.1 Kubelet Version: v1.8.10+coreos.0 Kube-Proxy Version: v1.8.10+coreos.0  
            Hide
            rodrigc Craig Rodrigues added a comment -

            Shaun Keenan any idea of how many proceses and threads were used by Jenkins when you started your test container?

            Show
            rodrigc Craig Rodrigues added a comment - Shaun Keenan any idea of how many proceses and threads were used by Jenkins when you started your test container?
            skeenan Shaun Keenan made changes -
            Attachment lwps.txt [ 43927 ]
            Hide
            skeenan Shaun Keenan added a comment -

            I see 1 parent with 112 LWPs.  Attached lwps.txt

            Show
            skeenan Shaun Keenan added a comment - I see 1 parent with 112 LWPs.  Attached  lwps.txt
            Show
            olamy Olivier Lamy added a comment - https://github.com/jenkinsci/winstone/commit/74775cc02ef92feaf247e45a32b193e45800805a
            olamy Olivier Lamy made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            olamy Olivier Lamy made changes -
            Labels jenkins jetty winstone jenkins jetty winstone winstone-5.0
            olamy Olivier Lamy made changes -
            Status Resolved [ 5 ] Fixed but Unreleased [ 10203 ]
            olamy Olivier Lamy made changes -
            Status Fixed but Unreleased [ 10203 ] Closed [ 6 ]
            Hide
            matthias_schmalz Matthias Schmalz added a comment -

            I would also love to see this as part of an LTS fix.

            I am waiting for some fixes which are in 2.138, but this don't run on our machines as well.

            Show
            matthias_schmalz Matthias Schmalz added a comment - I would also love to see this as part of an LTS fix. I am waiting for some fixes which are in 2.138, but this don't run on our machines as well.
            Hide
            skeenan Shaun Keenan added a comment -

            Another ping - when will this make it into LTS?

            Show
            skeenan Shaun Keenan added a comment - Another ping - when will this make it into LTS?
            Hide
            danielbeck Daniel Beck added a comment -

            As this was not nominated as an LTS candidate, it will not be in 2.138.3 next week. I expect it'll be in the next LTS 2.1xx.1 scheduled for December 5.

            For reference https://jenkins.io/download/lts/#backporting-process

            Show
            danielbeck Daniel Beck added a comment - As this was not nominated as an LTS candidate, it will not be in 2.138.3 next week. I expect it'll be in the next LTS 2.1xx.1 scheduled for December 5. For reference https://jenkins.io/download/lts/#backporting-process
            Hide
            skeenan Shaun Keenan added a comment -

            thank you!

            Show
            skeenan Shaun Keenan added a comment - thank you!
            Hide
            mikescholze Mike Scholze added a comment -

            It is fixed with 2.138.2!

            https://jenkins.io/changelog-stable/

            Update Winstone-Jetty from 4.4 to 5.0 to fix HTTP/2 support and threading problems on hosts with 30+ cores. (issue 53239, issue 52804, issue 51136, issue 52358)

            Show
            mikescholze Mike Scholze added a comment - It is fixed with 2.138.2! https://jenkins.io/changelog-stable/ Update Winstone-Jetty from 4.4 to 5.0 to fix HTTP/2 support and threading problems on hosts with 30+ cores. (issue 53239, issue 52804, issue 51136, issue 52358)
            Hide
            danielbeck Daniel Beck added a comment -

            Sorry about that. The number of duplicates of this issue not collapsed into one meant one got the label and the others did not.

            Show
            danielbeck Daniel Beck added a comment - Sorry about that. The number of duplicates of this issue not collapsed into one meant one got the label and the others did not.

              People

              • Assignee:
                olamy Olivier Lamy
                Reporter:
                elijah Elijah Zupancic
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: