Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53603

VMs are created without being added to the Jenkins nodes list

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.141
      Google compute engine plugin ver. 1.0.4
    • Similar Issues:

      Description

      I've setup the cloud to have an instance cap of 20 and 10 minutes retain time.

      Made sure that the list of nodes had only master and that in Google Cloud there were no VMs.

      I've then launched a job that fans out in lots of different jobs which forced the plugin to provision VM instances.

      The plugin ended up creating 20 instances in Google Cloud but only 9 were listed in the Jenkins nodes page.


      I then waited for 10 minutes to check whether the plugin would have kept track of those VMs. Unsurprisingly 11 VMs in Google Cloud were not terminated.

      This seems quite a nasty bug cause it would lead to unnecessary costs if those orphaned VMs are not spotted and terminated manually.

        Attachments

          Activity

          Hide
          lucanaldini Luca Naldini added a comment -

          It just happened again and looking at the Jenkins logs I see the following:

          Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
          Sep 21, 2018 11:13:32 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision
          INFO: Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@5555ca9e for excess workload of 1 units of label 'buildbox&&!buildboxtoexclude'
          Sep 21, 2018 11:13:33 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity
          INFO: Found capacity for 2 nodes in cloud gce
          Sent insert request
          Sep 21, 2018 11:13:34 AM hudson.triggers.SafeTimerTask run
          SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@1251ecc0 failed
          java.lang.ClassCastException
          

          Evan Brown, any chance this can be looked at?

           

           

          Show
          lucanaldini Luca Naldini added a comment - It just happened again and looking at the Jenkins logs I see the following: Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml Sep 21, 2018 11:13:32 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision INFO: Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@5555ca9e for excess workload of 1 units of label 'buildbox&&!buildboxtoexclude' Sep 21, 2018 11:13:33 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity INFO: Found capacity for 2 nodes in cloud gce Sent insert request Sep 21, 2018 11:13:34 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@1251ecc0 failed java.lang.ClassCastException Evan Brown , any chance this can be looked at?    
          Hide
          lucanaldini Luca Naldini added a comment - - edited

          The scenario was that there was only 1 job in the queue, that required a node with labels buildbox&&!buildboxtoexclude.

          buildbox is the label I've setup in your plugin config.

          I've canceled the job, manually deleted all the VM created in Google and tried to add a job which require a node with buildbox.

          That triggered the creation of the VM boxes and they were successfully added to the Jenkins nodes list.

          Could it be that the label definitions of the job that trigger your plugin has anything to do with it? Could that explain the CastException?

          Show
          lucanaldini Luca Naldini added a comment - - edited The scenario was that there was only 1 job in the queue, that required a node with labels  buildbox&&!buildboxtoexclude . buildbox is the label I've setup in your plugin config. I've canceled the job, manually deleted all the VM created in Google and tried to add a job which require a node with buildbox . That triggered the creation of the VM boxes and they were successfully added to the Jenkins nodes list. Could it be that the label definitions of the job that trigger your plugin has anything to do with it? Could that explain the CastException?
          Hide
          zombiemoose Rachel Yen added a comment -

          Currently looking into this issue. Any additional information provided so that I can replicate the bug is greatly appreciated.

          Show
          zombiemoose Rachel Yen added a comment - Currently looking into this issue. Any additional information provided so that I can replicate the bug is greatly appreciated.
          Hide
          zombiemoose Rachel Yen added a comment -

          Are you trying to define a label expression? The labels field of the plugin doesn't seem to support expressions.

           

          From the help file of labels in the UI: 
          Labels may contain any non-space characters, but you should avoid special characters such as any of these: !&|<>(), as other Jenkins features allow for defining label expressions, where these characters may be used.

          Show
          zombiemoose Rachel Yen added a comment - Are you trying to define a label expression? The labels field of the plugin doesn't seem to support expressions.   From the help file of labels in the UI:  Labels may contain any non-space characters, but you should avoid special characters such as any of these:  !&|<>() , as other Jenkins features allow for defining label expressions, where these characters may be used.
          Hide
          francisbolduc Francis Bolduc added a comment - - edited

          Environment: Jenkins 2.155, google-compute-engine-plugin 1.0.7

          I've have the same issue here. 

          In the jenkins configure page, I've setup a "Instance Configuration"; in "labels", I've type "gce-label"

          In my Jenkinsfile or pipeline configuration, I specified something like this :

          pipeline {
              agent {
                  label 'gce-label||vmware-label'
              }
              
              stages {
                  stage('Test') {
                      steps {
                          echo 'Yeah'
                      }
                  }
              }
          } 

          If I execute a build with this pipeline above, what will happened is that the google cloud engine plugin will provision VMs until it capped the capacity or quota and wait indefinitely and the nodes are not added to jenkins... It can wait beyond 30 minutes...

          But if I replay this build and modify the label in the pipeline by using below (and kills all VMs)

          label 'gce-label'
          

          The Build 2 will provision the VM in GCE, will be added correctly to jenkins and will execute the pipeline.
          And then, the Build 1 will execute as well.

          I did have a stacktrace but not sure at what jenkins version I was (between 2.153 and 2.155) when I found this:

          Dec 11, 2018 10:41:52 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision
          INFO: Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@322cd0be for excess workload of 1 units of label 'gce-label||vmware-label'
          Dec 11, 2018 10:41:52 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity
          INFO: Found capacity for 10 nodes in cloud GCE Slaves
          Dec 11, 2018 10:41:54 AM hudson.triggers.SafeTimerTask run
          SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@2cc40d7e failed
          java.lang.ClassCastException: hudson.model.labels.LabelExpression$Or cannot be cast to hudson.model.labels.LabelAtom
          	at jenkins.model.Jenkins.getLabelAtom(Jenkins.java:1947)
          	at hudson.model.Label.parse(Label.java:612)
          	at hudson.model.Node.getAssignedLabels(Node.java:304)
          	at hudson.model.Slave.<init>(Slave.java:196)
          	at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51)
          	at com.google.jenkins.plugins.computeengine.ComputeEngineInstance.<init>(ComputeEngineInstance.java:56)
          	at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:308)
          	at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:159)
          	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
          	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
          	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
          	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
          	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
          	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
          	at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          	at java.lang.Thread.run(Unknown Source)
          

          I've seen something similar to this:

          SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@1251ecc0 failed
          java.lang.ClassCastException
          

          Might be related to JENKINS-51277

          Show
          francisbolduc Francis Bolduc added a comment - - edited Environment: Jenkins 2.155, google-compute-engine-plugin 1.0.7 I've have the same issue here.  In the jenkins configure page, I've setup a "Instance Configuration"; in "labels", I've type "gce-label" In my Jenkinsfile or pipeline configuration, I specified something like this : pipeline { agent { label 'gce-label||vmware-label' } stages { stage( 'Test' ) { steps { echo 'Yeah' } } } } If I execute a build with this pipeline above, what will happened is that the google cloud engine plugin will provision VMs until it capped the capacity or quota and wait indefinitely and the nodes are not added to jenkins... It can wait beyond 30 minutes... But if I replay this build and modify the label in the pipeline by using below (and kills all VMs) label 'gce-label' The Build 2 will provision the VM in GCE, will be added correctly to jenkins and will execute the pipeline. And then, the Build 1 will execute as well. I did have a stacktrace but not sure at what jenkins version I was (between 2.153 and 2.155) when I found this: Dec 11, 2018 10:41:52 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud provision INFO: Provisioning node from config com.google.jenkins.plugins.computeengine.InstanceConfiguration@322cd0be for excess workload of 1 units of label 'gce-label||vmware-label' Dec 11, 2018 10:41:52 AM com.google.jenkins.plugins.computeengine.ComputeEngineCloud availableNodeCapacity INFO: Found capacity for 10 nodes in cloud GCE Slaves Dec 11, 2018 10:41:54 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@2cc40d7e failed java.lang.ClassCastException: hudson.model.labels.LabelExpression$Or cannot be cast to hudson.model.labels.LabelAtom at jenkins.model.Jenkins.getLabelAtom(Jenkins.java:1947) at hudson.model.Label.parse(Label.java:612) at hudson.model.Node.getAssignedLabels(Node.java:304) at hudson.model.Slave.<init>(Slave.java:196) at hudson.slaves.AbstractCloudSlave.<init>(AbstractCloudSlave.java:51) at com.google.jenkins.plugins.computeengine.ComputeEngineInstance.<init>(ComputeEngineInstance.java:56) at com.google.jenkins.plugins.computeengine.InstanceConfiguration.provision(InstanceConfiguration.java:308) at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provision(ComputeEngineCloud.java:159) at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) I've seen something similar to this: SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@1251ecc0 failed java.lang.ClassCastException Might be related to JENKINS-51277
          Hide
          zombiemoose Rachel Yen added a comment -

          Interesting. I'm about to fix JENKINS-51277 soon, and I would like to see if this still occurs after the fix.

          Show
          zombiemoose Rachel Yen added a comment - Interesting. I'm about to fix JENKINS-51277 soon, and I would like to see if this still occurs after the fix.
          Hide
          zombiemoose Rachel Yen added a comment -

          Closing as there are no further inquiries.

           

          If there are any more issues, please open an issue here as we are migrating off JIRA:

          https://github.com/jenkinsci/google-compute-engine-plugin/issues

          Show
          zombiemoose Rachel Yen added a comment - Closing as there are no further inquiries.   If there are any more issues, please open an issue here as we are migrating off JIRA: https://github.com/jenkinsci/google-compute-engine-plugin/issues

            People

            • Assignee:
              zombiemoose Rachel Yen
              Reporter:
              lucanaldini Luca Naldini
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: