Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61153

Agent hangs after startup when two jobs start simultaneously

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ec2-plugin
    • None
    • Fresh Jenkins container (jenkins/jenkins:lts - version 2.204.2)
      Suggested plugins installed through wizard + ec2-plugin (specifics in description)

      I'm seeing an intermittent issue where, on occasion, 2 jobs will start on an agent that is spun up through the ec2-plugin at the same time and get stuck. Through my reproduction steps, I'm able to get it to happen 10%-30% of the time. I'm able to reproduce it with a fresh install of jenkins, using an off the shelf AMI with no custom changes. The agent is messed up once this happens, so the only solution is to cancel the jobs, restart them, and kill the bad agent.

      Current plugin list:

       

      Jenkins Version: 2.204.2
      
      ----------
      INSTALLED PLUGINS (sorted):
      -----------
      ace-editor (1.1)
      ant (1.11)
      antisamy-markup-formatter (1.8)
      apache-httpcomponents-client-4-api (4.5.10-2.0)
      authentication-tokens (1.3)
      aws-credentials (1.28)
      aws-java-sdk (1.11.723)
      bouncycastle-api (2.18)
      branch-api (2.5.5)
      build-timeout (1.19.1)
      cloudbees-folder (6.11.1)
      command-launcher (1.4)
      credentials (2.3.1)
      credentials-binding (1.21)
      display-url-api (2.3.2)
      docker-commons (1.16)
      docker-workflow (1.21)
      durable-task (1.33)
      ec2 (1.49.1)
      email-ext (2.68)
      git (4.1.1)
      git-client (3.1.1)
      git-server (1.9)
      github (1.29.5)
      github-api (1.106)
      github-branch-source (2.6.0)
      gradle (1.36)
      handlebars (1.1.1)
      jackson2-api (2.10.2)
      jdk-tool (1.4)
      jquery-detached (1.2.1)
      jsch (0.1.55.2)
      junit (1.28)
      ldap (1.21)
      lockable-resources (2.7)
      mailer (1.30)
      mapdb-api (1.0.9.0)
      matrix-auth (2.5)
      matrix-project (1.14)
      momentjs (1.1.1)
      node-iterator-api (1.5.0)
      pam-auth (1.6)
      pipeline-build-step (2.11)
      pipeline-github-lib (1.0)
      pipeline-graph-analysis (1.10)
      pipeline-input-step (2.11)
      pipeline-milestone-step (1.3.1)
      pipeline-model-api (1.5.1)
      pipeline-model-declarative-agent (1.1.1)
      pipeline-model-definition (1.5.1)
      pipeline-model-extensions (1.5.1)
      pipeline-rest-api (2.13)
      pipeline-stage-step (2.3)
      pipeline-stage-tags-metadata (1.5.1)
      pipeline-stage-view (2.13)
      plain-credentials (1.7)
      resource-disposer (0.14)
      scm-api (2.6.3)
      script-security (1.70)
      ssh-credentials (1.18.1)
      ssh-slaves (1.31.1)
      structs (1.20)
      subversion (2.13.1)
      timestamper (1.11)
      token-macro (2.11)
      trilead-api (1.0.5)
      variant (1.3)
      workflow-aggregator (2.6)
      workflow-api (2.39)
      workflow-basic-steps (2.19)
      workflow-cps (2.80)
      workflow-cps-global-lib (2.15)
      workflow-durable-task-step (2.35)
      workflow-job (2.36)
      workflow-multibranch (2.21)
      workflow-scm-step (2.10)
      workflow-step-api (2.22)
      workflow-support (3.4)
      ws-cleanup (0.38)
      

       

      Steps to reproduce:

      1. Start docker container: jenkins/jenkins:2.204.2
        1. docker run -it -p 8080:8080 -p 50000:50000 jenkins/jenkins:2.204.2
      1. Install recommended plugins through wizard
      2. Install "Amazon EC2 Plugin" version 1.49.1
      3. Configure Amazon EC2 to connect to AWS
        1. For AMI, I saw this with different agents, but to reproduce, you can use " amazon/amzn2-ami-hvm-2.0.20200207.1-x86_64-gp2 by amazon" which is "ami-0a887e401f7654935"
        2. Size: T2Medium (also tried m4.large)
        3. FS root: /tmp
        4. Remote User: ec2-user
        5. label 'parallel_bug'
        6. num executors: 2
        7. Max uses: 2 (This is to force many new instances to come up to increase chances of bug showing up)
        8. Any other configurations would be specific to your AWS Account.
      4. Create 3 jobs:
        1. "freestyle-job" 
          1. freestyle job type 
          2. Take string parameter "VERSION"
          3. allow concurrent
          4. restrict to label 'parallel_bug'
          5. build step: execute shell: "echo 'freestyle job'"
        2. "pipeline-job"
          1. pipeline job type 
          2. Take string parameter "VERSION"
          3. pipeline {
                agent {
                    label 'parallel_bug'
                }
                stages {
                    stage('execute shell stage') {
                        steps {
                            sh 'echo "pipeline job"'
                        }
                    }
                }
            }
        1. "kickoff-job"
          1. pipeline job type
          2. for (int i = 0; i < 10; i++) {
              def s = "subjob_${i}" 
              stage ("${s}") {
                sleep 2
                build(
                  job: 'freestyle-job',
                  wait: false,
                    parameters : [
                    string( name: 'VERSION', value: "${i}" )
                  ]
                )
                build(
                  job: 'pipeline-job',
                  wait: false,
                    parameters : [
                    string( name: 'VERSION', value: "${i}" )
                  ]
                )
              }
            }
      1. Execute Kickoff job. One run will kick off 10 jobs each of freestyle job and pipeline job. This is usually enough to see the error in at least 1 agent.

      I've seen this issue when starting a single agent and jobs starting on it. These reproduction steps are just to reduce waiting and get the case to show up as quick as possible. I've attached screenshots of what the jobs look like from the UI and a thread dump of the pipeline job. 

            thoulen FABRIZIO MANFREDI
            bryananderson Bryan Anderson
            Votes:
            7 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: