Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52736

Occasionally the plugin leaves orphaned, stopped VMs

    Details

    • Similar Issues:

      Description

      Occasionally this plugin leaves orphaned VMs after they are terminated / no longer used by this plugin, left in a stopped state:

      This wastes compute resources and costs.

      Ideally the plugin would not do this, but in addition, having a periodic (every 5 minutes) check to go through the current VMs in the project, see which ones are tagged with "jenkins" and then automatically terminate any VMs tagged with that and not known to Jenkins. This would make it resilient against unexpected Jenkins restarts, etc. (though it should be an option in case multiple Jenkins instances share the same GCE project).

        Attachments

          Activity

          Hide
          hachque June Rhodes added a comment - - edited

          Oh that's very wierd and not at all how similar functionality works on other platforms like AWS. The idea that the machine will be automatically terminated but Compute Engine won't automatically delete it or clean up storage for you doesn't make a lot of sense to me (like if you want to keep data when using a preemptible instance then I guess, but I'd argue a normal instance makes more sense in that case).

          I don't think there's a way to change the behavior to delete instead of terminate on preemption here, but we definitely want to delete storage resources as soon as they're no longer in use. So we're going to have to do something janky here:

          • Create a Pub/Sub topic and subscription in the project, which connects to a Google Cloud Function that deletes the preemptible VM
          • Add a Stackdriver logging export specifically for preemption notices and configure that export to point at the Pub/Sub topic

          I thought the Pub/Sub topic could point at Jenkins, but that won't work for Jenkins instances not accessible by the Internet, so we have to deploy a handle on GCF instead.

          If anyone has any better ideas, let me know.

          Show
          hachque June Rhodes added a comment - - edited Oh that's very wierd and not at all how similar functionality works on other platforms like AWS. The idea that the machine will be automatically terminated but Compute Engine won't automatically delete it or clean up storage for you doesn't make a lot of sense to me (like if you want to keep data when using a preemptible instance then I guess, but I'd argue a normal instance makes more sense in that case). I don't think there's a way to change the behavior to delete instead of terminate on preemption here, but we definitely want to delete storage resources as soon as they're no longer in use. So we're going to have to do something janky here: Create a Pub/Sub topic and subscription in the project, which connects to a Google Cloud Function that deletes the preemptible VM Add a Stackdriver logging export specifically for preemption notices and configure that export to point at the Pub/Sub topic I thought the Pub/Sub topic could point at Jenkins, but that won't work for Jenkins instances not accessible by the Internet, so we have to deploy a handle on GCF instead. If anyone has any better ideas, let me know.
          Hide
          zombiemoose Rachel Yen added a comment -

          Hi June,

          Were you using preemptible instances? 
          Also, thanks for pointing this out. I will have to research this and perhaps change how we're terminating instances.

           

           

          Show
          zombiemoose Rachel Yen added a comment - Hi June, Were you using preemptible instances?  Also, thanks for pointing this out. I will have to research this and perhaps change how we're terminating instances.    
          Hide
          hachque June Rhodes added a comment -

          Yup, we are using preemptible instances.

          Show
          hachque June Rhodes added a comment - Yup, we are using preemptible instances.
          Hide
          ingwar Karol Lassak added a comment -

          I think I have found root cause..

           

          Because GCP stop instances when pre-empted they are left in that state..

          And then plugin tries to delete only "RUNNING" instances..

           

          cloud.client.terminateInstanceWithStatus(cloud.projectId, zone, name, "RUNNING");

           

          I think that this line should be changed to 

           

          cloud.client.terminateInstance(cloud.projectId, zone, name);

          Show
          ingwar Karol Lassak added a comment - I think I have found root cause..   Because GCP stop instances when pre-empted they are left in that state.. And then plugin tries to delete only "RUNNING" instances..   cloud.client.terminateInstanceWithStatus(cloud.projectId, zone, name, "RUNNING");   I think that this line should be changed to    cloud.client.terminateInstance(cloud.projectId, zone, name);
          Show
          craigbarber Craig Barber added a comment - https://github.com/jenkinsci/google-compute-engine-plugin/issues/77

            People

            • Assignee:
              zombiemoose Rachel Yen
              Reporter:
              hachque June Rhodes
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: