Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58390

Docker cloud provisioning 1 slave behind

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: docker-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.164.3
      Docker Plugin 1.1.6
    • Similar Issues:

      Description

      We have multiple docker cloud hosts setup with multiple templates per host. Some templates are duplicated across clouds, some are unique to a host. I'm not sure yet how to reproduce or get into this state but I can help diagnose on my end with some guidance. 

      After a period of time successfully provisioning docker agents, we'll get into a state whereby a job is waiting for a container that never comes. If another job is launched which requests a container which happens to match the criteria of the already queued job, the new container will provision and the first job will take it. This leaves the new job waiting until another job requests a matching container.

      If another job requests a unique container, it too will wait indefinitely until another job requests the same container.

      Restarting master resolves the issue temporarily (few days).

      I noticed in the logs once we have hit this state and I launch a job requesting container-A the log shows the following and the container will not start until a second job requests a container.

      Jul 08, 2019 12:05:24 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Disconnected computer for node 'docker-003jf7ab2t864'.
      Jul 08, 2019 12:05:24 PM INFO hudson.remoting.Request$2 run
      Failed to send back a reply to the request hudson.remoting.Request$2@17f6e1b9: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@3448814f:docker-003jf7ab2t864": channel is already closed
      Jul 08, 2019 12:05:24 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Removed Node for node 'docker-003jf7ab2t864'.
      Jul 08, 2019 12:05:24 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Stopped container '736b8e5ffa4e2a0a06965fc5768bbf241654efe3193e72a54a9f72fcf400e417' for node 'docker-003jf7ab2t864'.
      Jul 08, 2019 12:05:24 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Removed container '736b8e5ffa4e2a0a06965fc5768bbf241654efe3193e72a54a9f72fcf400e417' for node 'docker-003jf7ab2t864'.
      

      If I launch a second job requesting container-A it looks more normal but now the second job is stuck waiting.

      Jul 08, 2019 12:07:20 PM INFO hudson.slaves.NodeProvisioner$2 run
      Image of CONTAINER-A:latest provisioning successfully completed. We have now 146 computer(s)
      Jul 08, 2019 12:07:20 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Disconnected computer for node 'docker-003jf7ab2t864'.
      Jul 08, 2019 12:07:20 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
      Asked to provision 1 slave(s) for: merge
      Jul 08, 2019 12:07:20 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
      Provisioning 'CONTAINER-A:latest' on 'docker-cloud-1'
      Jul 08, 2019 12:07:20 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
      Will provision 'CONTAINER-A:latest', for label: 'merge', in cloud: 'docker-cloud-1'
      Jul 08, 2019 12:07:20 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Started provisioning Image of CONTAINER-A:latest from docker-cloud-1 with 1 executors. Remaining excess workload: 0
      Jul 08, 2019 12:07:20 PM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
      Pulling image 'CONTAINER-A:latest'. This may take awhile...
      Jul 08, 2019 12:07:21 PM INFO io.jenkins.docker.DockerTransientNode$1 println
      Removed Node for node 'docker-003jf7ab2t864'.
      Jul 08, 2019 12:07:22 PM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
      Finished pulling image 'CONTAINER-A:latest', took 2002 ms
      Jul 08, 2019 12:07:22 PM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
      Trying to run container for CONTAINER-A:latest
      Jul 08, 2019 12:07:22 PM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
      Trying to run container for node docker-003jf9zomwkv9 from image: CONTAINER-A:latest
      Jul 08, 2019 12:07:22 PM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
      Started container ID fd2202c7c9db3549e82351926694a9e2965d9c9d83f133049af7f99f4f6e94da for node docker-003jf9zomwkv9 from image: CONTAINER-A:latest
      

        Attachments

          Activity

          Hide
          broussar Adam Brousseau added a comment -

          As a workaround, we wrote a separate job that runs after the first job has started, and asks for the same container, then times out.

          Show
          broussar Adam Brousseau added a comment - As a workaround, we wrote a separate job that runs after the first job has started, and asks for the same container, then times out.

            People

            • Assignee:
              ndeloof Nicolas De Loof
              Reporter:
              broussar Adam Brousseau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: