Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42187

Docker plugin causes queue hanging in the case of hanging requests

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Stacktrace explanation: When DockerOnceRetentionStrategy runs, it locks the Jenkins Queue. If it decides to terminate a Cloud agent, the a DockerSlave#_terminate() gets invoked. This method invokes the REST API call using docker-java. This call has no timeout: https://github.com/jenkinsci/docker-plugin/blob/master/docker-plugin/src/main/java/com/nirima/jenkins/plugins/docker/DockerSlave.java#L168 .

      If the REST API hangs due to whatever reason, the entire Queue hangs till the request gets interrupted somehow. We see it on one of the instances, where containers cannot be terminated sometimes.

      Queue hanging causes massive outage of the Jenkins functionality, including build scheduling and particular UI widgets.

      IMHO all calls to Docker Java REST API in the plugin should have the timeout specified. E.g. Yet Another Docker Plugin does it: https://github.com/KostyaSha/yet-another-docker-plugin/blob/6853301c885447ca31648d6cfa4861e6e272bf16/yet-another-docker-plugin/src/main/java/com/github/kostyasha/yad/commons/DockerStopContainer.java#L37-L41

      java.lang.Thread.State: RUNNABLE
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      at java.net.SocketInputStream.read(SocketInputStream.java:170)
      at java.net.SocketInputStream.read(SocketInputStream.java:141)
      at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
      at sun.security.ssl.InputRecord.read(InputRecord.java:503)
      at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
      - locked <0x0000000704c04ba8> (a java.lang.Object)
      at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
      at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
      - locked <0x0000000704c06bc8> (a sun.security.ssl.AppInputStream)
      at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
      at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
      at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:129)
      at org.apache.http.impl.io.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:53)
      at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
      at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
      at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
      at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
      at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
      at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
      at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
      at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
      at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
      at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
      at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
      at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)
      at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
      at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
      at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
      at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:681)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:437)
      at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:343)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:31)
      at com.github.dockerjava.jaxrs.StopContainerCmdExec.execute(StopContainerCmdExec.java:12)
      at com.github.dockerjava.jaxrs.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:23)
      at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
      at com.github.dockerjava.core.command.StopContainerCmdImpl.exec(StopContainerCmdImpl.java:63)
      at com.nirima.jenkins.plugins.docker.DockerSlave._terminate(DockerSlave.java:168)
      at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:67)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1$1.run(DockerOnceRetentionStrategy.java:112)
      at hudson.model.Queue._withLock(Queue.java:1306)
      at hudson.model.Queue.withLock(Queue.java:1189)
      at com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy$1.run(DockerOnceRetentionStrategy.java:106)
      

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?

            Show
            jglick Jesse Glick added a comment - Could StopContainerCmdImpl.exec be run in a background thread, and DockerSlave._terminate return immediately?
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Jesse Glick it could. It is what is being done in Yet Another Docker plugin.
            Timeout is just a minimal patch of course

            Show
            oleg_nenashev Oleg Nenashev added a comment - Jesse Glick it could. It is what is being done in Yet Another Docker plugin. Timeout is just a minimal patch of course
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Peter Darton
            Path:
            src/main/java/org/jenkinsci/plugins/vSphereCloud.java
            http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8
            Log:
            Slave termination now deletes VMs asynchronously.

            JENKINS-42187 applies to us too; same cause, same fix.
            So we avoid trying to delete VMs in-line with the slave's
            terminate method and instead schedule deletion for later.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/b9594afef5b74a99a65d0b9f426854e835346af8 Log: Slave termination now deletes VMs asynchronously. JENKINS-42187 applies to us too; same cause, same fix. So we avoid trying to delete VMs in-line with the slave's terminate method and instead schedule deletion for later.

              People

              • Assignee:
                ndeloof Nicolas De Loof
                Reporter:
                oleg_nenashev Oleg Nenashev
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: