Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53879

EC2 workers terminated before connection can be established, only on v1.40

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • ec2-plugin
    • None
    • Jenkins ver. 2.138.1
      Ubuntu 14.04

      I just did my monthly round of plugin upgrades and found that EC2 workers were failing to connect.

      Downgrading this plugin, and only this plugin, to v1.39 resolved the issue, so I'm fairly confident this is the source of the behavior.

       

      When I watched the activity, I see worker nodes spinning up, correctly, based on our various labels and job requirements. They get to the running state in EC2, but shortly thereafter (< 1 minute) they are terminated. I believe that this is happening during the guest OS boot time, as I tried polling for connections on port 22 from the master node and never got a success – and our workers are all configured to run sshd open to the master on 22.

       

      All of our nodes are configured with Launch Timeout in Seconds = 300, but this failure was very consistent and the last time I measured our launch times, they were around 3.5 minutes.

      If I had to venture a guess, I would say that something has changed in terms of the leniency with which nodes are treated as they start, and their boot times are being counted differently.

      If someone can confirm that that's what changed, I'll probably just up my timeouts to 10 minutes and walk away, but I'm not confident that would work.

            thoulen FABRIZIO MANFREDI
            sirosen Stephen Rosen
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: