Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30284

EC2 plugin too aggressive in timing in contacting new AWS instance over SSH

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      EC2 Plugin v 1.29
      Jenkins 1.624
    • Similar Issues:

      Description

      In every 10 or so instance launches, I see cases where the EC2 plugin opens a SSH connection to the new instance before the time where I believe the corresponding private key has been put in place by the AWS infrastructure during the launch of the instance. As a result, we see errors in the node launch. The output for the launch is below.

      just before slave CentOS (i-f295e51b) gets launched ...
      executing pre-launch scripts ...
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is still pending/launching, waiting 5s
      Node CentOS (i-f295e51b)(i-f295e51b) is ready
      Connecting to 10.240.1.146 on port 22, with timeout 10000.
      Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      Waiting for SSH to come up. Sleeping 5.
      Connecting to 10.240.1.146 on port 22, with timeout 10000.
      Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      Waiting for SSH to come up. Sleeping 5.
      Connecting to 10.240.1.146 on port 22, with timeout 10000.
      Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      Waiting for SSH to come up. Sleeping 5.
      Connecting to 10.240.1.146 on port 22, with timeout 10000.
      Connected via SSH.
      bootstrap()
      Getting keypair...
      Using key: jenkins
      dc:xx:xx:xx
      ----BEGIN RSA PRIVATE KEY----
      MIIEow<private key info>
      Authenticating as centos
      Authentication failed. Trying again...
      Authenticating as centos
      Authentication failed. Trying again...
      Authenticating as centos
      Authentication failed. Trying again...
      Authenticating as centos
      Authentication failed. Trying again...
      Authenticating as centos
      Authentication failed. Trying again...
      Authenticating as centos
      ERROR: Publickey authentication failed.
      java.io.IOException: Publickey authentication failed.
      at com.trilead.ssh2.auth.AuthenticationManager.authenticatePublicKey(AuthenticationManager.java:315)
      at com.trilead.ssh2.Connection.authenticateWithPublicKey(Connection.java:467)
      at hudson.plugins.ec2.ssh.EC2UnixLauncher.bootstrap(EC2UnixLauncher.java:260)
      at hudson.plugins.ec2.ssh.EC2UnixLauncher.launch(EC2UnixLauncher.java:91)
      at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:107)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:238)
      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: The connection is closed.
      at com.trilead.ssh2.auth.AuthenticationManager.deQueue(AuthenticationManager.java:63)
      at com.trilead.ssh2.auth.AuthenticationManager.getNextMessage(AuthenticationManager.java:86)
      at com.trilead.ssh2.auth.AuthenticationManager.authenticatePublicKey(AuthenticationManager.java:290)
      ... 10 more
      Caused by: java.io.IOException: Peer sent DISCONNECT message (reason code 2): Too many authentication failures for centos
      at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:766)
      at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:489)
      ... 1 more

        Attachments

          Issue Links

            Activity

            Hide
            fishnix E Camden Fisher added a comment -

            Hi again -
            It's been a few days now and this has completely resolved the issues we were seeing. I would :+1: some UI around these options and allowing them to be set.
            Cheers,
            -Camden

            Show
            fishnix E Camden Fisher added a comment - Hi again - It's been a few days now and this has completely resolved the issues we were seeing. I would :+1: some UI around these options and allowing them to be set. Cheers, -Camden
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Francis Upton IV
            Path:
            src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java
            http://jenkins-ci.org/commit/ec2-plugin/06c7369977c01520b1d89a3a868d779cdd2600c9
            Log:
            JENKINS-30284 EC2 plugin too aggressive in timing in contacting new AWS instance over SSH (changed default timeouts)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Francis Upton IV Path: src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java http://jenkins-ci.org/commit/ec2-plugin/06c7369977c01520b1d89a3a868d779cdd2600c9 Log: JENKINS-30284 EC2 plugin too aggressive in timing in contacting new AWS instance over SSH (changed default timeouts)
            Hide
            francisu Francis Upton added a comment -

            I think I'm going to change the defaults to 30000/30 and leave the properties as they are. There is already a lot of obscure UI around various things and I think since this can be tweaked by the properties if necessary (and maybe it won't even be necessary with better defaults) I'm reluctant to expose it in the UI. If people really want then, I can do it, but let's wait and see on that. I will document the properties on the Wiki.

            Show
            francisu Francis Upton added a comment - I think I'm going to change the defaults to 30000/30 and leave the properties as they are. There is already a lot of obscure UI around various things and I think since this can be tweaked by the properties if necessary (and maybe it won't even be necessary with better defaults) I'm reluctant to expose it in the UI. If people really want then, I can do it, but let's wait and see on that. I will document the properties on the Wiki.
            Hide
            francisu Francis Upton added a comment -

            Released in 1.30

            Show
            francisu Francis Upton added a comment - Released in 1.30
            Hide
            lovelycuppatea Alan added a comment - - edited

            we're having an issue with the ssh client connecting before user data is done.

            how do we edit bootstrapAuthSleepMs?

             

            UPDATE: in case anyone else needs to know add -Djenkins.ec2.bootstrapAuthSleepMs=60000 to your jenkins launch command (java -jar jenkins.war -D...) to add a 60 second wait.

            Show
            lovelycuppatea Alan added a comment - - edited we're having an issue with the ssh client connecting before user data is done. how do we edit bootstrapAuthSleepMs?   UPDATE: in case anyone else needs to know add -Djenkins.ec2.bootstrapAuthSleepMs=60000 to your jenkins launch command (java -jar jenkins.war -D...) to add a 60 second wait.

              People

              • Assignee:
                francisu Francis Upton
                Reporter:
                mkingsbury Mike Kingsbury
              • Votes:
                2 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: