Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41569

Pipeline hangs waiting for resume on an agent which never was

    Details

    • Similar Issues:

      Description

      We have a Pipeline run which has been blocked for a number of days with:

      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      

      We're using the Azure VM Agents plugin for dynamic provisioning. While I haven't had the time to fully reproduce this, here's what I believe is happening:

      • Pipeline requests node labelled "windows".
      • Azure VM Agents plugin begins dynamically provisioning a VM matching that "windows' label.
      • The Azure cloud allocates and bootstraps a VM, meaning that there is an instance which exists, has an IP address, etc.
      • The Azure VM Agents creates a Node in Jenkins with a generated name ("win2012-b19510") which is in a suspended state
      • Pipeline says "great, I have win2012-b19510, that's where I am going to execute"
      • Azure VM Agents plugin runs its defined "Init Script" to actually bootstrap the Jenkins agent software on the VM
      • The "Init script" fails to complete successfully
      • Time elapses and the Azure VM Agents thread sees the "win2012-b19510" instance as stale, and reaps the VM accordingly.
      • Poor little Pipeline sits forever awaiting a VM which will never come back

      I won't have time to reproduce this today, but will try to at my next available free moment (ha!).

      I'm not sure if it's possible, but if my hypothesis is correct, to only pin a "node() { }" in Pipeline to an agent which has actually come online and was able to perform work.

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Not a Pipeline issue that I can see. The cloud provider should wait to add a Node to Jenkins until it has actually started the agent. If that is not feasible, it should at least block the node from accepting tasks.

            Show
            jglick Jesse Glick added a comment - Not a Pipeline issue that I can see. The cloud provider should wait to add a Node to Jenkins until it has actually started the agent. If that is not feasible, it should at least block the node from accepting tasks.
            Hide
            clguiman Claudiu Guiman added a comment -

            What R. Tyler Croy is describing makes sense. Please check why the init script is failing.
            Jesse Glick you are right. I'll update the plugin so we don't add the node until the init script was completed successful.

            Show
            clguiman Claudiu Guiman added a comment - What R. Tyler Croy is describing makes sense. Please check why the init script is failing. Jesse Glick you are right. I'll update the plugin so we don't add the node until the init script was completed successful.

              People

              • Assignee:
                zackliu Chenyang Liu
                Reporter:
                rtyler R. Tyler Croy
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: