Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42811

Vsphere Cloud plugin stops working after a number of builds

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: vsphere-cloud-plugin
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      I am using the Vsphere Cloud plugin to spin Jenkins agents (slaves) on demand in new virtual machines using linked clones and keep-until-idle retention strategy. In the Jenkins instance there are a mix of "static virtual machines" Jenkins agents and Jenkins agents on demand

      The plugin works correctly until a certain time or maybe limited number of agents created. Approximately after +-70 builds or 12 hours the virtual machines are created correctly, connected to Jenkins but they are shutdown before the build is allocated in the agent or virtual machine.

      The following is a piece of log of the main Jenkins.log file when this situation occurs, this is happening over and over.
       

      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Starting Virtual Machine...
      Feb 21, 2017 3:49:19 PM org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call
      INFO: Provisioned new slave ess-lin_baulim19krk8qcnwn1k5yloml
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM already powered on
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for VMTools
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM Tools are running
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Finished wait for VMTools
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@8c91844 to launch slave.
      Feb 21, 2017 3:49:29 PM hudson.slaves.NodeProvisioner$2 run
      INFO: ess-lin_baulim19krk8qcnwn1k5yloml provisioning successfully completed. We have now 5 computer(s)
      [02/21/17 15:49:58] SSH Launch of unknown on X.X.X.X failed in 63,139 ms
      Feb 21, 2017 3:49:58 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_6c1eramupwzkjyu9n3t4cdl5] Slave online
      Feb 21, 2017 3:50:05 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
      Feb 21, 2017 3:50:15 PM hudson.slaves.CloudRetentionStrategy check
      INFO: Disconnecting ess-lin_baulim19krk8qcnwn1k5yloml
      Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
      Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Asking SSHLauncher to launch slave.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
      INFO: There are 0 VMs in this cloud. The instance cap for the cloud is 300, so we have room for more
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud provision
      INFO: provision(xxx,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=ess-lin, provisioned=[], planned=[], max=2147483647, fullness=0.000%]]

      If we take a look to the log of the agent in the Jenkins UI the following can be seen

      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Starting Virtual Machine...
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM already powered on
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for VMTools
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM Tools are running
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Finished wait for VMTools
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@c8db95d to launch slave.
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Ignoring disconnect attempt because a connect attempt is in progress.
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Asking SSHLauncher to launch slave.
      [02/21/17 17:55:45] [SSH] Opening SSH connection to X.X.X.X:22.
      [02/21/17 17:55:46] [SSH] Authentication successful.
      [02/21/17 17:55:46] [SSH] The remote users environment is:
      BASH=/bin/bash
      BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
      BASH_VERSION='4.3.11(1)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/home/jenkins
      HOSTNAME=xxx
      HOSTTYPE=x86_64
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LD_LIBRARY_PATH=/usr/local/lib
      LOGNAME=jenkins
      MACHTYPE=x86_64-pc-linux-gnu
      MAIL=/var/mail/jenkins
      MAKEFLAGS='-j 8'
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
      PIPESTATUS=([0]="0")
      PPID=1916
      PS4='+ '
      PWD=/home/jenkins
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='X.X.X.X 36503 22'
      SSH_CONNECTION='X.X.X.X 36503 X.X.X.X 22'
      TERM=dumb
      UID=1000
      USER=jenkins
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=1
      _=MAKEFLAGS
      [02/21/17 17:55:46] [SSH] Checking java version of java
      [02/21/17 17:55:46] [SSH] java -version returned 1.7.0_79.
      [02/21/17 17:55:46] [SSH] Starting sftp client.
      [02/21/17 17:55:47] [SSH] Copying latest slave.jar...
      [02/21/17 17:55:48] [SSH] Copied 506,667 bytes.
      Expanded the channel window size to 4MB
      [02/21/17 17:55:48] [SSH] Starting slave process: cd "/home/jenkins" && java  -jar slave.jar
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Slave.jar version: 2.53.3
      This is a Unix slave
      Evacuated stdout
      Slave successfully connected and online
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Slave online
      
      
      
      
      
      HTTP ERROR 404
      
      
      Problem accessing /computer/ess-lin_8hbmdsbdw82ekvakhf1667bk/logText/progressiveHtml. Reason:
      
          Not Found
      Powered by Jetty://
      

      This situation is fixed when I set back to 0 the instance cap limit (for some reason instead of 0 the Jenkins UI shows a number of 2147483647 instead of 0 and the plugin starts immediately working fine again.

      When I set it to 0, the following error appears in the main Jenkins.log file:
       

      Mar 15, 2017 5:14:06 PM hudson.model.Run execute
      INFO: ess/ess_monitorSlaves #2662 main build action completed: SUCCESS
      Mar 15, 2017 5:14:09 PM hudson.ivy.IvyBuildTrigger$DescriptorImpl configure
      INFO: IvyConfigurations: 0
      Mar 15, 2017 5:14:09 PM hudson.plugins.jabber.im.transport.JabberPublisherDescriptor configure
      INFO: No hostname specified.
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: STARTING VSPHERE CLOUD
      

      Is there something I can do to fix this behaviour ?

      Thank you in advance

        Attachments

          Activity

          Hide
          jorgepena jpena added a comment -

          As a workaround the behaviour gets back to normal by applying automatically configure thing the vsphere cloud stack in Jenkins via Groovy script

          import org.jenkinsci.plugins.vSphereCloud
          import org.jenkinsci.plugins.vsphere.VSphereConnectionConfig
          import org.jenkinsci.plugins.vSphereCloudLauncher
          import org.jenkinsci.plugins.vSphereCloudSlaveTemplate
          import org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy
          import hudson.slaves.ComputerLauncher
          import hudson.plugins.sshslaves.SSHLauncher
          import hudson.model.Node
          import hudson.plugins.sshslaves.SSHConnector
          import hudson.slaves.RetentionStrategy
          import net.sf.json.JSONArray
          import net.sf.json.JSONObject
          import org.apache.commons.io.IOUtils
          import jenkins.model.Jenkins
          
          def newvSphereCloud(JSONObject obj) {
            new vSphereCloud(
                  new VSphereConnectionConfig(obj.optString('vsHost'),obj.optString('credentialsId')),
                  obj.optString('vsDescription'),
                  obj.optInt('maxOnlineSlaves'),
                  obj.optInt('instanceCap', 0),
                  bindJSONToList(vSphereCloudSlaveTemplate.class, obj.opt('templates')),
                )
          }
          
          def newvSphereCloudSlaveTemplate(JSONObject obj) {
            SSHLauncher sshLauncher = new SSHLauncher(
              obj.optString('launcher_host'),
              obj.optInt('launcher_port', 22),
              obj.optString('launcher_credentials_id'),
              obj.optString('launcher_jvm_options'),
              obj.optString('launcher_java_path'),
              obj.optString('launcher_prefix_start_slave_cmd'),
              obj.optString('launcher_suffix_start_slave_cmd'),
              obj.optInt('launcher_connection_timeout_seconds'),
              obj.optInt('launcher_max_num_retries'),
              obj.optInt('launcher_retry_wait_time'),
              new hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy(),
            )
          
            String node_usage = (obj.optString('mode', 'NORMAL').toUpperCase().equals('EXCLUSIVE'))? 'EXCLUSIVE' : 'NORMAL'
            //for now the launcher_method will always be "ssh"
            ComputerLauncher launcher = new vSphereCloudLauncher(
              sshLauncher,
              obj.optString('vsDescription'),
              obj.optString('vmName'),
              obj.optBoolean('overrideLaunchSupported'),
              obj.optBoolean('waitForVMTools'),
              obj.optString('snapshotName'),
              obj.optString('launchDelay'),
              obj.optString('idleOption'),
              obj.optString('LimitedTestRunCount'))
          
            RetentionStrategy retentionStrategy = new VSphereCloudRetentionStrategy(obj.optInt('retentionStrategy_idleMinutes', 2))
            return new vSphereCloudSlaveTemplate (
              obj.optString('cloneNamePrefix'),
              obj.optString('masterImageName'),
              obj.optBoolean('useSnapshot',false),
              obj.optString('snapshotName'),
              obj.optBoolean('linkedClone',false),
              obj.optString('cluster'),
              obj.optString('resourcePool'),
              obj.optString('datastore'),
              obj.optString('folder'),
              obj.optString('customizationSpec'),
              obj.optString('templateDescription'),
              obj.optInt('templateInstanceCap', 0),
              obj.optInt('numberOfExecutors', 1),
              obj.optString('remoteFS'),
              obj.optString('labelString'),
              Node.Mode."${node_usage}",
              obj.optBoolean('forceVMLaunch',false),
              obj.optBoolean('waitForVMTools',false),
              obj.optInt('launchDelay', 60),
              obj.optInt('limitedRunCount', 1),
              obj.optBoolean('saveFailure',false),
              null, //obj.optString('targetResourcePool'),
              null, //obj.optString('targetHost'),
              obj.optString('credentialsId'),
              sshLauncher,
              retentionStrategy,
              new ArrayList<hudson.slaves.NodeProperty>(), // List<? extends NodeProperty<?>>
              null  // List<? extends VSphereGuestInfoProperty>
            )
          }
          
          def bindJSONToList(Class type, Object src) {
            if(!(type == vSphereCloud) && !(type == vSphereCloudSlaveTemplate)) {
              throw new Exception("Must use vSphereCloud or vSphereCloudSlaveTemplate class.")
            }
            ArrayList<?> vsphere_array
            if(type == vSphereCloud){
              vsphere_array = new ArrayList<vSphereCloud>()
            }
            else {
              vsphere_array = new ArrayList<vSphereCloudSlaveTemplate>()
            }
            //cast the configuration object to a VSphere instance which Jenkins will use in configuration
            if (src instanceof JSONObject) {
              //uses string interpollation to call a method
              vsphere_array.add("new${type.getSimpleName()}"(src))
            }
            else if (src instanceof JSONArray) {
              for (Object o : src) {
                if (o instanceof JSONObject) {
                  vsphere_array.add("new${type.getSimpleName()}"(o))
                }
              }
            }
            return vsphere_array
          }
          
          def templateURL = new URL(template)
          JSONArray clouds_vsphere = JSONArray.fromObject(templateURL.getText())
          clouds = bindJSONToList(vSphereCloud.class, clouds_vsphere)
          if ("false".equals(dryRun)){
            println 'Dry run - OFF'
            Jenkins.instance.clouds.removeAll(vSphereCloud)
            Jenkins.instance.clouds.addAll(clouds)
            Jenkins.instance.save()
          } else {
            println 'Dry run - ON'
          }
          
          clouds*.each {
            println "Configuring VSphere cloud ${it.vsDescription}"
          }
          
          println 'done'
          

          It needs a json template file to work, and this needs to be scheduled every 30 min at least

          [ {
           "vsDescription": "xxx",
           "maxOnlineSlaves": 0,
           "vsHost": "https://xxx",
           "credentialsId": "xxx",
           "instanceCap": 0,
           "templates":
           [
               {
                 "cloneNamePrefix": "xxx",
                 "masterImageName": "xxx",
                 "useSnapshot": true,
                 "snapshotName": "xxx",
                 "linkedClone": true,
                 "cluster": "xxx",
                 "resourcePool": "xxx",
                 "datastore": "xxx",
                 "folder": "",
                 "customizationSpec": "",
                 "templateDescription": "Created automatically",
                 "templateInstanceCap": xxx,
                 "numberOfExecutors": xxx,
                 "remoteFS": "xxx",
                 "labelString": "xxx",
                 "mode": "EXCLUSIVE",
                 "forceVMLaunch": false,
                 "waitForVMTools": true,
                 "launchDelay": 0,
                 "limitedRunCount": 0,
                 "saveFailure": false,
                 "launcher_method": "ssh",
                 "launcher_host": "",
                 "launcher_port": 22,
                 "launcher_credentials_id": "xxx",
                 "launcher_jvm_options": "",
                 "launcher_java_path": "",
                 "launcher_prefix_start_slave_cmd": "",
                 "launcher_suffix_start_slave_cmd": "",
                 "launcher_connection_timeout_seconds": 0,
                 "launcher_max_num_retries": 0,
                 "launcher_retry_wait_time": 0,
                 "retentionStrategy_idleMinutes": xxx
               },
           ]
           }
          ]
          
          
          Show
          jorgepena jpena added a comment - As a workaround the behaviour gets back to normal by applying automatically configure thing the vsphere cloud stack in Jenkins via Groovy script import org.jenkinsci.plugins.vSphereCloud import org.jenkinsci.plugins.vsphere.VSphereConnectionConfig import org.jenkinsci.plugins.vSphereCloudLauncher import org.jenkinsci.plugins.vSphereCloudSlaveTemplate import org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy import hudson.slaves.ComputerLauncher import hudson.plugins.sshslaves.SSHLauncher import hudson.model.Node import hudson.plugins.sshslaves.SSHConnector import hudson.slaves.RetentionStrategy import net.sf.json.JSONArray import net.sf.json.JSONObject import org.apache.commons.io.IOUtils import jenkins.model.Jenkins def newvSphereCloud(JSONObject obj) { new vSphereCloud( new VSphereConnectionConfig(obj.optString( 'vsHost' ),obj.optString( 'credentialsId' )), obj.optString( 'vsDescription' ), obj.optInt( 'maxOnlineSlaves' ), obj.optInt( 'instanceCap' , 0), bindJSONToList(vSphereCloudSlaveTemplate.class, obj.opt( 'templates' )), ) } def newvSphereCloudSlaveTemplate(JSONObject obj) { SSHLauncher sshLauncher = new SSHLauncher( obj.optString( 'launcher_host' ), obj.optInt( 'launcher_port' , 22), obj.optString( 'launcher_credentials_id' ), obj.optString( 'launcher_jvm_options' ), obj.optString( 'launcher_java_path' ), obj.optString( 'launcher_prefix_start_slave_cmd' ), obj.optString( 'launcher_suffix_start_slave_cmd' ), obj.optInt( 'launcher_connection_timeout_seconds' ), obj.optInt( 'launcher_max_num_retries' ), obj.optInt( 'launcher_retry_wait_time' ), new hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy(), ) String node_usage = (obj.optString( 'mode' , 'NORMAL' ).toUpperCase().equals( 'EXCLUSIVE' ))? 'EXCLUSIVE' : 'NORMAL' // for now the launcher_method will always be "ssh" ComputerLauncher launcher = new vSphereCloudLauncher( sshLauncher, obj.optString( 'vsDescription' ), obj.optString( 'vmName' ), obj.optBoolean( 'overrideLaunchSupported' ), obj.optBoolean( 'waitForVMTools' ), obj.optString( 'snapshotName' ), obj.optString( 'launchDelay' ), obj.optString( 'idleOption' ), obj.optString( 'LimitedTestRunCount' )) RetentionStrategy retentionStrategy = new VSphereCloudRetentionStrategy(obj.optInt( 'retentionStrategy_idleMinutes' , 2)) return new vSphereCloudSlaveTemplate ( obj.optString( 'cloneNamePrefix' ), obj.optString( 'masterImageName' ), obj.optBoolean( 'useSnapshot' , false ), obj.optString( 'snapshotName' ), obj.optBoolean( 'linkedClone' , false ), obj.optString( 'cluster' ), obj.optString( 'resourcePool' ), obj.optString( 'datastore' ), obj.optString( 'folder' ), obj.optString( 'customizationSpec' ), obj.optString( 'templateDescription' ), obj.optInt( 'templateInstanceCap' , 0), obj.optInt( 'numberOfExecutors' , 1), obj.optString( 'remoteFS' ), obj.optString( 'labelString' ), Node.Mode. "${node_usage}" , obj.optBoolean( 'forceVMLaunch' , false ), obj.optBoolean( 'waitForVMTools' , false ), obj.optInt( 'launchDelay' , 60), obj.optInt( 'limitedRunCount' , 1), obj.optBoolean( 'saveFailure' , false ), null , //obj.optString( 'targetResourcePool' ), null , //obj.optString( 'targetHost' ), obj.optString( 'credentialsId' ), sshLauncher, retentionStrategy, new ArrayList<hudson.slaves.NodeProperty>(), // List<? extends NodeProperty<?>> null // List<? extends VSphereGuestInfoProperty> ) } def bindJSONToList( Class type, Object src) { if (!(type == vSphereCloud) && !(type == vSphereCloudSlaveTemplate)) { throw new Exception( "Must use vSphereCloud or vSphereCloudSlaveTemplate class." ) } ArrayList<?> vsphere_array if (type == vSphereCloud){ vsphere_array = new ArrayList<vSphereCloud>() } else { vsphere_array = new ArrayList<vSphereCloudSlaveTemplate>() } // cast the configuration object to a VSphere instance which Jenkins will use in configuration if (src instanceof JSONObject) { //uses string interpollation to call a method vsphere_array.add( " new ${type.getSimpleName()}" (src)) } else if (src instanceof JSONArray) { for ( Object o : src) { if (o instanceof JSONObject) { vsphere_array.add( " new ${type.getSimpleName()}" (o)) } } } return vsphere_array } def templateURL = new URL(template) JSONArray clouds_vsphere = JSONArray.fromObject(templateURL.getText()) clouds = bindJSONToList(vSphereCloud.class, clouds_vsphere) if ( " false " .equals(dryRun)){ println 'Dry run - OFF' Jenkins.instance.clouds.removeAll(vSphereCloud) Jenkins.instance.clouds.addAll(clouds) Jenkins.instance.save() } else { println 'Dry run - ON' } clouds*.each { println "Configuring VSphere cloud ${it.vsDescription}" } println 'done' It needs a json template file to work, and this needs to be scheduled every 30 min at least [ { "vsDescription" : "xxx" , "maxOnlineSlaves" : 0, "vsHost" : "https: //xxx" , "credentialsId" : "xxx" , "instanceCap" : 0, "templates" : [ { "cloneNamePrefix" : "xxx" , "masterImageName" : "xxx" , "useSnapshot" : true , "snapshotName" : "xxx" , "linkedClone" : true , "cluster" : "xxx" , "resourcePool" : "xxx" , "datastore" : "xxx" , "folder" : "", "customizationSpec" : "", "templateDescription" : "Created automatically" , "templateInstanceCap" : xxx, "numberOfExecutors" : xxx, "remoteFS" : "xxx" , "labelString" : "xxx" , "mode" : "EXCLUSIVE" , "forceVMLaunch" : false , "waitForVMTools" : true , "launchDelay" : 0, "limitedRunCount" : 0, "saveFailure" : false , "launcher_method" : "ssh" , "launcher_host" : "", "launcher_port" : 22, "launcher_credentials_id" : "xxx" , "launcher_jvm_options" : "", "launcher_java_path" : "", "launcher_prefix_start_slave_cmd" : "", "launcher_suffix_start_slave_cmd" : "", "launcher_connection_timeout_seconds" : 0, "launcher_max_num_retries" : 0, "launcher_retry_wait_time" : 0, "retentionStrategy_idleMinutes" : xxx }, ] } ]
          Hide
          pjdarton pjdarton added a comment -

          First, is this still a problem with the current version (2.20) of the plugin? If not, please close the issue.

          If it is... when you say "the following error appears in the main Jenkins.log file", what error do you mean? The log shows no errors, only "INFO" and "WARNING".

          As for your ingenious workaround...

          • creating a new vSphereCloud instance and passing in an instanceCap of 0 will result in an instanceCap of Integer.MAX_VALUE (2147483647); that's expected. It's ugly, sure, but it is expected. There's a lot of code in this plugin that's evolved over time and is "just about good enough" - if folks were to rewrite it today, coding against Jenkins as it is right now, it could look a lot neater, but this is old code, so it's got a few wrinkles.
          • By repeatedly re-creating the templates, you're defeating the plugin's ability to track what VMs exist, which may well result in duplicate slave/VM names, and maybe causing yourself a memory leak; that shouldn't be necessary and, as far as I am aware (as "it works for me"), is not necessary.
          Show
          pjdarton pjdarton added a comment - First, is this still a problem with the current version (2.20) of the plugin? If not, please close the issue. If it is... when you say "the following error appears in the main Jenkins.log file", what error do you mean? The log shows no errors, only "INFO" and "WARNING". As for your ingenious workaround... creating a new vSphereCloud instance and passing in an instanceCap of 0 will result in an instanceCap of Integer.MAX_VALUE (2147483647); that's expected. It's ugly, sure, but it is expected. There's a lot of code in this plugin that's evolved over time and is "just about good enough" - if folks were to rewrite it today, coding against Jenkins as it is right now, it could look a lot neater, but this is old code, so it's got a few wrinkles. By repeatedly re-creating the templates, you're defeating the plugin's ability to track what VMs exist, which may well result in duplicate slave/VM names, and maybe causing yourself a memory leak; that shouldn't be necessary and, as far as I am aware (as "it works for me"), is not necessary.
          Hide
          jorgepena jpena added a comment -

          Hi pjdarton thank you for replying.

          Yes, when I wrote error I refered to the warning printed on the post above. After creating the issue I realised that having the value 2147483647 is just cosmetic.

          We are relying on this plugin for creating and destroying jenkins agents on demand. This issue keeps happening with newer versions of the plugin so we are running a job recurrently in our Jenkins instance to avoid this issue. We didn't experience issues of memory. We had issues of VMs with duplicated names but I am not sure it was related with this workaround. We developed a small script that would destroy the virtual machines on vmware that weren't present in Jenkins, after all these are linked clones and were re-created again.

          Currently we are running

          Jenkins version: 2.150.3
          vsphere-cloud plugin: 2.21

          This is a piece of the log where it should create a new jenkins agent (the last one since we set up a hard limit of 50 agents) but it didn't create it

          Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud provision
          provision(xcode-10.1,20): Provisioning 0 new =[]
          Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
          There are 49 VMs in this cloud. The instance cap for the cloud is 50, so we have room for more
          Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud provision
          provision(xcode-11.1,6): 0 existing slaves (=0 executors), templates available are [Template[prefix=sod-xxx-01, provisioned=[sod-xxx-011, sod-xxx-0110, sod-xxx-012, sod-xxx-013, sod-xxx-014, sod-xxx-015, sod-xxx-016, sod-xxx-017, sod-xxx-018, sod-xxx-019], planned=[], unwanted={}, max=10, fullness=100.000%]]
          

          That piece of log just keeps repeating over and over. After running the job with the mentioned workaround it starts creating agents again.

          Regards,

          Show
          jorgepena jpena added a comment - Hi pjdarton thank you for replying. Yes, when I wrote error I refered to the warning printed on the post above. After creating the issue I realised that having the value 2147483647 is just cosmetic. We are relying on this plugin for creating and destroying jenkins agents on demand. This issue keeps happening with newer versions of the plugin so we are running a job recurrently in our Jenkins instance to avoid this issue. We didn't experience issues of memory. We had issues of VMs with duplicated names but I am not sure it was related with this workaround. We developed a small script that would destroy the virtual machines on vmware that weren't present in Jenkins, after all these are linked clones and were re-created again. Currently we are running Jenkins version: 2.150.3 vsphere-cloud plugin: 2.21 This is a piece of the log where it should create a new jenkins agent (the last one since we set up a hard limit of 50 agents) but it didn't create it Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud provision provision(xcode-10.1,20): Provisioning 0 new =[] Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted There are 49 VMs in this cloud. The instance cap for the cloud is 50, so we have room for more Jan 10, 2020 3:11:32 PM INFO org.jenkinsci.plugins.vSphereCloud provision provision(xcode-11.1,6): 0 existing slaves (=0 executors), templates available are [Template[prefix=sod-xxx-01, provisioned=[sod-xxx-011, sod-xxx-0110, sod-xxx-012, sod-xxx-013, sod-xxx-014, sod-xxx-015, sod-xxx-016, sod-xxx-017, sod-xxx-018, sod-xxx-019], planned=[], unwanted={}, max=10, fullness=100.000%]] That piece of log just keeps repeating over and over. After running the job with the mentioned workaround it starts creating agents again. Regards,
          Hide
          pjdarton pjdarton added a comment -

          According to that log message, you've set a max=10 for instances from that template.
          So, unless you've got other templates defined on that cloud, you're never going to reach 50 as that template is capped at 10 - that's why it's saying it's 100% full when it's got 10 instances provisioned with max=10.

          You either need to allow that template to spawn an unlimited number of instances (capped only by the cloud instance cap), or to spawn a larger number of instances, or to define other templates such that the sum of all templates' max fields comes to at least the cloud's instance cap.

          Show
          pjdarton pjdarton added a comment - According to that log message, you've set a max=10 for instances from that template. So, unless you've got other templates defined on that cloud, you're never going to reach 50 as that template is capped at 10 - that's why it's saying it's 100% full when it's got 10 instances provisioned with max=10 . You either need to allow that template to spawn an unlimited number of instances (capped only by the cloud instance cap), or to spawn a larger number of instances, or to define other templates such that the sum of all templates' max fields comes to at least the cloud's instance cap.
          Hide
          jorgepena jpena added a comment -

          We have other templates defined on the cloud.

          However, even with no cap defined globally or in the template, the behaviour of not creating more instances keeps happening sometimes until the script is executed.

          Show
          jorgepena jpena added a comment - We have other templates defined on the cloud. However, even with no cap defined globally or in the template, the behaviour of not creating more instances keeps happening sometimes until the script is executed.
          Hide
          pjdarton pjdarton added a comment -

          Well, for the scenario you've given logs for, the behavoir is working-as-designed - it was told not to provision more than 10 of that template so that's what it did. It only broke that limit after you killed off the old cloud and replaced it with one with no memory of the old instances

          If you can reproduce the scenario where it's "not creating more instances" when there is "no cap" (and provide logs & other information) then that'd make the problem more solvable.

          Show
          pjdarton pjdarton added a comment - Well, for the scenario you've given logs for, the behavoir is working-as-designed - it was told not to provision more than 10 of that template so that's what it did. It only broke that limit after you killed off the old cloud and replaced it with one with no memory of the old instances If you can reproduce the scenario where it's "not creating more instances" when there is "no cap" (and provide logs & other information) then that'd make the problem more solvable.

            People

            • Assignee:
              Unassigned
              Reporter:
              jorgepena jpena
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: