Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54155

Jobs using docker agents disrupt each other if triggered simultaneously

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Component/s: docker-plugin, remoting
    • Labels:
      None
    • Environment:
      Docker 18.06.1-ce, MacOS and Linux
      Jenkins 2.141
      Docker plugin 1.1.5
    • Similar Issues:

      Description

      In a Jenkins environment which has one Docker agent template defined, create two simple Pipeline jobs which will both use that template. One job will run an 'echo' step and exit quickly, the other will sleep for 60 seconds. If you trigger these two jobs simultaneously, the short-running one will finish successfully and its container will then be stopped. When that happens, the remoting connection to the remaining job gets disrupted. It throws this exception in its console log:

      Cannot contact dockeragent-0002sinrcnyoa: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on dockeragent-0002sinrcnyoa failed. The channel is closing down or has closed down
      

      And at this point the job is hung and must be manually aborted. The workaround for this problem is either to put a five second sleep in between triggering the two jobs, or to create an additional Docker agent template with a different label, and configure each job to use a separate label.

      I'm not sure if the problem here is with the Docker plugin or with Remoting.

      Attached is a complete log from the Jenkins service, from the time the two jobs were triggered. The 'SEVERE: Error during callback' errors happened while the broken job was still trying to run before it was aborted.

        Attachments

          Activity

          Hide
          pjdarton pjdarton added a comment -

          This is probably because docker containers are specifically designed to do one and only one job and then be destroyed.  If you're having more than one bit of work run on a container then that's not going to go well - you need to ensure that your pipeline code takes over one node exclusively and uses it exclusively until it's done with it - no sharing.

          If you've got multiple pipelines all after the same template then Jenkins should spin up multiple containers for that workload - we shouldn't have any jobs sharing containers...

           

          I'm sorry I don't have a specific answer for this one - if it's still an problem with the latest release then please do see what else you can find out; of not, please close the issue.

          Show
          pjdarton pjdarton added a comment - This is probably because docker containers are specifically designed to do one and only one job and then be destroyed.  If you're having more than one bit of work run on a container then that's not going to go well - you need to ensure that your pipeline code takes over one node exclusively and uses it exclusively until it's done with it - no sharing. If you've got multiple pipelines all after the same template then Jenkins should spin up multiple containers for that workload - we shouldn't have any jobs sharing containers...   I'm sorry I don't have a specific answer for this one - if it's still an problem with the latest release then please do see what else you can find out; of not, please close the issue.
          Hide
          owenmehegan Owen Mehegan added a comment - - edited

          pjdarton what I'm describing is two separate jobs that just refer to the same container template - not the same actual instance of the container. If I run these two jobs simultaneously (just click "run now" on each within a second or two of each other), they spin up two containers as I expect, but when the first job finishes, both containers are killed. The first job shows a normal successful completion, and the second job shows a remoting exception. It just seems like the plugin mistakenly kills both containers when it should only kill one. If I wait about 10 seconds between triggering the two jobs, I don't have this problem. But on a busy system, this causes erroneous build failures relatively often.

          What else can I provide that would help you investigate further?

          Show
          owenmehegan Owen Mehegan added a comment - - edited pjdarton what I'm describing is two separate jobs that just refer to the same container template - not the same actual instance of the container. If I run these two jobs simultaneously (just click "run now" on each within a second or two of each other), they spin up two containers as I expect, but when the first job finishes, both containers are killed. The first job shows a normal successful completion, and the second job shows a remoting exception. It just seems like the plugin mistakenly kills both containers when it should only kill one. If I wait about 10 seconds between triggering the two jobs, I don't have this problem. But on a busy system, this causes erroneous build failures relatively often. What else can I provide that would help you investigate further?
          Hide
          pjdarton pjdarton added a comment -

          Re: What else?
          A nice short & simple way of reproducing the issue would be ideal
          See https://github.com/jenkinsci/docker-plugin/blob/master/CONTRIBUTING.md#reporting-a-new-issue
          In this case, I'd want to look at the clouds/templates defined in config.xml and the pipeline code you're running but, if there's nothing obvious there, a full repro case using just the official default slave images would be needed.

          Show
          pjdarton pjdarton added a comment - Re: What else? A nice short & simple way of reproducing the issue would be ideal See https://github.com/jenkinsci/docker-plugin/blob/master/CONTRIBUTING.md#reporting-a-new-issue In this case, I'd want to look at the clouds/templates defined in config.xml and the pipeline code you're running but, if there's nothing obvious there, a full repro case using just the official default slave images would be needed.
          Hide
          pjdarton pjdarton added a comment -

          Closing due to lack of response.

          Show
          pjdarton pjdarton added a comment - Closing due to lack of response.

            People

            • Assignee:
              Unassigned
              Reporter:
              owenmehegan Owen Mehegan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: