Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57215

Plugin starts a worked and might immediately stop it, because of cached EC2Computer.getUptime()

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.164.2
      Amazon EC2 plugin 1.42
    • Similar Issues:
    • Released As:
      1.45

      Description

      AFAIU there's a race condition in EC2RetentionStrategy.java#L99

       

      a few lines below the  call to `computer.getUptime()` will return a cached value, whereas `computer.getState();` will not. If the worker was just started, this might lead to a race condition where the uptime will be calculated on the previous start time, rather then the current, and state will instead correctly report running.

      As a result of this inconsistency the plugin will end up stopping the instance because it will falsely compute uptime from the previous launch time, rather the current one (time difference from previous launch time is most likely to be more that idle timeout, which for us is 60 minutes).

      Does not happen often, perhaps we can just change `computer.getUptime()` to return the actual value rather than a cached value? Ideally calls to `computer` methods should return a consistent view for all getters.

      I'm willing to provide a PR, if someone could provide guidance on the suggested solution. Thanks!

       

        Attachments

          Activity

          Hide
          datallah Daniel Atallah added a comment -

          I also discovered this same issue and created https://github.com/jenkinsci/ec2-plugin/pull/359 to fix it.

          Show
          datallah Daniel Atallah added a comment - I also discovered this same issue and created https://github.com/jenkinsci/ec2-plugin/pull/359 to fix it.
          Hide
          johnlengeling John Lengeling added a comment -

          I ran into this issue using version 1.42/1.43 of the plugin when running a large job that wants to provision 100+ nodes.   I see the PR is approved now, if someone and can get a snapshot build, I will test it out.

          Show
          johnlengeling John Lengeling added a comment - I ran into this issue using version 1.42/1.43 of the plugin when running a large job that wants to provision 100+ nodes.   I see the PR is approved now, if someone and can get a snapshot build, I will test it out.
          Hide
          thoulen FABRIZIO MANFREDI added a comment -

          Fixed in the 1.45

          Show
          thoulen FABRIZIO MANFREDI added a comment - Fixed in the 1.45

            People

            • Assignee:
              unicolet Umberto Nicoletti
              Reporter:
              unicolet Umberto Nicoletti
            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: