Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71632

Multi-cloud load distribution

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • None

      https://plugins.jenkins.io/kubernetes/ supports defining multiple clouds (Kubernetes clusters) simultaneously.

      There doesn't appear to be any logic/mechanism for distributing load amongst the clouds however.

      Assuming that all defined clouds do not use the "Restrict pipeline support to authorized folder" option, all clouds are basically equivalent (when no specific "cloud" is targeted by name in the pipeline). The only way to discriminate between the clouds is by setting them in a specific order in the Jenkins UI.

      In practice I am seeing all loads directed to the first/top cloud as defined in ./manage/configureClouds/

      This seems logical at first, but it means that the 1+n clouds are never touched. Some mechanism to distribute agents among the clouds (actively balanced or random for instance) would be nice.

      But what is far more pressing (and the reason I'm submitting this ticket) is that the Kubernetes plugin will not switch to the next defined cloud when the first/primary cloud has reached its resource quota.

      It will saturate the entire first cloud en then start spamming:

      ERROR: Failed to launch project-task-os-test-1-wd2ww-ks1tv-p3nfg
      io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://my.corp.internal:6443/api/v1/namespaces/my-project/pods. Message: pods "project-task-os-test-1-r01cm-x525n-fppjr" is forbidden: exceeded quota: 

      Some help text reads:

      The Kubernetes cloud to use to schedule the pod.
      If unset, the first available Kubernetes cloud will be used.

      The key word here being "available". The Kubernetes plugin seems to think that since the cloud API is available, the cloud itself is available for work, which might not be the case when resource limits are hit.

      I suppose one way around this would be to set a "Concurrency Limit" per cloud, but this is a rather ham-fisted approach, since it doesn't take into account the different resource profiles of different types of Jenkins agents. I have some agents with 1CPU+2GB resources, and some with 4CPU+12GB. Setting an arbitrary concurrency limit could leave valuable resources unused and introduce (unnecessary) queuing, depending on how conservative the limit is set. And it simply seems inefficient when resource quotas exist and feedback on their status is readily available through the API.

      Long story short: 2 requests:

      1. major: When the resource quota of cloud 1 is reached, try cloud 2, 3, 4, etc.
      2. minor: Allow some intelligent distribution among equivalent (non restricted) clouds such as "random", "balanced", etc.)

            Unassigned Unassigned
            paybas Pay Bas
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: