Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49753

EC2 cloud Windows nodes terminate as soon as connected

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Component/s: ec2-plugin
    • Labels:
    • Environment:
      Jenkins 2.108, EC2 plugin 1.38, Windows 2016, Windows 10
    • Similar Issues:

      Description

      We have been having an strange issue recently with our on-demand Windows nodes on AWS terminating as soon as the slave agent is brought online.

      I create a job that has the tag aws, which is associated with a Windows AMI in our EC2 console.

      I start the job.

      Jenkins starts the EC2 instance. When it comes up, the master connects to the slave and tries to launch the slave.jar. Then the instance is immediately disconnected and terminated, and jenkins launches a new instance. This will continue repeatedly until the job is stopped.

      Here's an example of the logs in jenkins:

      INFO: Authenticating as asdc-jenkins
      Feb 26, 2018 6:13:14 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to 10.248.9.120 on port 22, with timeout 10000.
      Feb 26, 2018 6:13:14 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connected via SSH.
      Feb 26, 2018 6:13:14 PM hudson.plugins.ec2.EC2Cloud log
      INFO: connect fresh as root
      Feb 26, 2018 6:13:14 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connecting to 10.248.9.120 on port 22, with timeout 10000.
      Feb 26, 2018 6:13:14 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Connected via SSH.
      Feb 26, 2018 6:13:15 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Creating tmp directory (/tmp) if it does not exist
      Feb 26, 2018 6:13:15 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Verifying that java exists
      Feb 26, 2018 6:13:15 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Copying slave.jar
      Feb 26, 2018 6:13:16 PM hudson.plugins.ec2.EC2Cloud log
      INFO: Launching slave agent (via Trilead SSH2 Connection): java -jar /tmp/slave.jar
      Feb 26, 2018 6:13:16 PM hudson.plugins.ec2.EC2OndemandSlave terminate
      INFO: Terminated EC2 instance (terminated): i-0e6028c6c76826300
      Feb 26, 2018 6:13:16 PM hudson.plugins.ec2.EC2OndemandSlave terminate
      INFO: Removed EC2 instance from jenkins master: i-0e6028c6c76826300

      This is only happening on our windows nodes (connecting via cygwin). Mac and Linux nodes launch fine.

      To make things more confusing, if I launch the instance in Jenkins, and then manually add the node to the master via the web interface, it works fine.

      I've tried this on the latest version of the Windows 2016 AMI on Amazon, with only cygwin and java installed and still have this problem.

        Attachments

          Activity

          Hide
          angegar laurent gil added a comment -

          Can you please share your configuration to access the windows slave through cygwin. 

          In my side i installed cygwin i manage to access the windows box with ssh from a bash but the ec2 plugins seems to fail after the connection. I guess this code alway return false https://github.com/jenkinsci/ec2-plugin/blob/495adac020384e8601b0ec40a2fa52b8123d8647/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L284 whereas i manage to start ubuntu box with the same ssh privatekey. 

          Any idea ?

          Show
          angegar laurent gil added a comment - Can you please share your configuration to access the windows slave through cygwin.  In my side i installed cygwin i manage to access the windows box with ssh from a bash but the ec2 plugins seems to fail after the connection. I guess this code alway return false https://github.com/jenkinsci/ec2-plugin/blob/495adac020384e8601b0ec40a2fa52b8123d8647/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L284  whereas i manage to start ubuntu box with the same ssh privatekey.  Any idea ?
          Hide
          malberghini Mike Alberghini added a comment -

          We managed to get things working two different ways:

          The first was in the cloud config for the Windows AMI, we had to set "Override temporary dir location" to $(cygpath -w /cygdrive/c/jenkins)

          The other way was to give up on having the job launch the AMI.  We built a custom AMI that has a script that runs the swarm connect on boot, and have a Pipeline launch the AMI instead of the Jenkins job.

          Show
          malberghini Mike Alberghini added a comment - We managed to get things working two different ways: The first was in the cloud config for the Windows AMI, we had to set "Override temporary dir location" to  $(cygpath -w /cygdrive/c/jenkins) The other way was to give up on having the job launch the AMI.  We built a custom AMI that has a script that runs the swarm connect on boot, and have a Pipeline launch the AMI instead of the Jenkins job.

            People

            • Assignee:
              francisu Francis Upton
              Reporter:
              malberghini Mike Alberghini
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: