Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62158

Bad performance on EC2 instance for first build

    Details

    • Similar Issues:

      Description

      I have a pipeline project which should run at a EC2 instance node.

      I have configured an EC2 connection and starting EC2 t3.medium Windows 10 instances automatically. This all works fine.

      But, the first build at an EC2 instance always performs very bad (slow!!). The next build at the same instance (without rebott etc) is much more faster.

       

      @Library('BMS-Libraries')
      import static bms.mail.Email.*
      import static bms.nexus.Nexus.*
      import static bms.utils.Utils.*
      
      node('AWS_VS2017') {
      		stage('Cleanup Build Machine'){
      			//deleting current workspace directory
      			deleteDir()
      		}
      		
      		stage('Preparing Build machine...'){
      	retrieveAndExtractBuildTools(this)
      		}
      
      //Do some more .......
      }
      

      I attached a screenshot of the runtime of the different pipeline steps.

       

      I connected via RDP to the instance during first build nad task-manager didn't display a high CPU or Memory consumption

        Attachments

          Activity

          Hide
          edthorne Ed Thorne added a comment -

          I don't know that this is limited to the EC2 plugin. I'm seeing a similar issue with a simple Linux JNLP agent. The first job that runs on the agent takes considerably longer than it normally should. Here's an image that shows my results.

          Builds 31 and 36 are after the agent has been rebooted. Each step is doing essentially the same operations:

          • sh 'env'
          • sh w/simple multi-line command (pwd, ls -al, for loop with print/sleep)
          • writeFile the multi-line command to disk to be used as input for sshScript
          • sshScript to a remote instance and execute the same multi-line command

          The main difference is that the first two steps run on the master node while the third runs on a remote JNLP agent.

          For builds 31 and 36 the execution timings show that it takes almost 20 seconds for a 'sh' step to be loaded and started. The 'sshScript' that follows takes about three minutes from the end of the prior 'sh' step completing until output is logged. Under normal circumstances these operations take about two seconds or less to log some form of activity.

          Observing the output of 'top' and checking CloudWatch metrics for the instance I don't see high resource usage or anything that would explain why this first job after reboot is suffering from such horrible performance. 

          Show
          edthorne Ed Thorne added a comment - I don't know that this is limited to the EC2 plugin. I'm seeing a similar issue with a simple Linux JNLP agent. The first job that runs on the agent takes considerably longer than it normally should. Here's an image that shows my results. Builds 31 and 36 are after the agent has been rebooted. Each step is doing essentially the same operations: sh 'env' sh w/simple multi-line command (pwd, ls -al, for loop with print/sleep) writeFile the multi-line command to disk to be used as input for sshScript sshScript to a remote instance and execute the same multi-line command The main difference is that the first two steps run on the master node while the third runs on a remote JNLP agent. For builds 31 and 36 the execution timings show that it takes almost 20 seconds for a 'sh' step to be loaded and started. The 'sshScript' that follows takes about three minutes from the end of the prior 'sh' step completing until output is logged. Under normal circumstances these operations take about two seconds or less to log some form of activity. Observing the output of 'top' and checking CloudWatch metrics for the instance I don't see high resource usage or anything that would explain why this first job after reboot is suffering from such horrible performance. 
          Hide
          edthorne Ed Thorne added a comment -

          I forgot to mention. This is Jenkins 2.234 with Pipeline 2.6 and SSH Pipeline Steps 2.0.0.

          Show
          edthorne Ed Thorne added a comment - I forgot to mention. This is Jenkins 2.234 with Pipeline 2.6 and SSH Pipeline Steps 2.0.0.
          Hide
          jmkgreen James Green added a comment -

          I'm not sure we are seeing the same bug, but recently (last couple of weeks) our ec2 builds are taking a lot longer too. Always the first build of an ec2 instance, never subsequent builds.

          The big change is upgrading this plugin. According to the agent logs (accessible from the Jenkins web console), the Jenkins master is now awaiting the EC2 instance console output to print the ssh fingerprints to verify the expected keys ahead of connecting. This is acknowledged to take potentially minutes to wait on.

          We'd love to know if there is a workaround for this but we're not familiar with the authentication system in use.

          One way or another, I'm being approached by staff members using Jenkins complaining that this is now far too slow. I'm open to suggestions.

          Show
          jmkgreen James Green added a comment - I'm not sure we are seeing the same bug, but recently (last couple of weeks) our ec2 builds are taking a lot longer too. Always the first build of an ec2 instance, never subsequent builds. The big change is upgrading this plugin. According to the agent logs (accessible from the Jenkins web console), the Jenkins master is now awaiting the EC2 instance console output to print the ssh fingerprints to verify the expected keys ahead of connecting. This is acknowledged to take potentially minutes to wait on. We'd love to know if there is a workaround for this but we're not familiar with the authentication system in use. One way or another, I'm being approached by staff members using Jenkins complaining that this is now far too slow. I'm open to suggestions.
          Hide
          mramonleon Ramon Leon added a comment -

          First time Jenkins builds a job in an EC2 instance there is a process which doesn't happen on subsequent connections:

          • the instance has to be created by AWS
          • the instance initiate
          • Jenkins creates an init script
          • Jenkins installs the JVM
          • Jenkins installs open-ssh clients
          • Jenkins copies the remote client library
          • Jenkins launches the client on the instance

          All these steps are not done on next builds.

          On latest releases of the EC2 plugin we've included a new security step to avoid MitM attacks. This step waits for the output console of the instance (linux ones) to be ready and the plugin reads the SSH Key to guarantee the machine the plugin is connecting to is the expected one. This steps adds some more time to the initial setup. It depends on the time for the console to be ready, but it is usually likely 5 minutes.

          You can avoid this new gap by lowering the security level to Accept New or Off. None of these security strategies wait for the console to be ready, but they have some security implications. We've provided a wide range of strategies to allow every administrator to decide which one best fits her/his environment. All is documented in the Plugin documentation: https://github.com/jenkinsci/ec2-plugin/#security

          Show
          mramonleon Ramon Leon added a comment - First time Jenkins builds a job in an EC2 instance there is a process which doesn't happen on subsequent connections: the instance has to be created by AWS the instance initiate Jenkins creates an init script Jenkins installs the JVM Jenkins installs open-ssh clients Jenkins copies the remote client library Jenkins launches the client on the instance All these steps are not done on next builds. On latest releases of the EC2 plugin we've included a new security step to avoid MitM attacks . This step waits for the output console of the instance (linux ones) to be ready and the plugin reads the SSH Key to guarantee the machine the plugin is connecting to is the expected one. This steps adds some more time to the initial setup. It depends on the time for the console to be ready, but it is usually likely 5 minutes. You can avoid this new gap by lowering the security level to Accept New or Off . None of these security strategies wait for the console to be ready, but they have some security implications. We've provided a wide range of strategies to allow every administrator to decide which one best fits her/his environment. All is documented in the Plugin documentation: https://github.com/jenkinsci/ec2-plugin/#security
          Hide
          dhoerner Daniel Hoerner added a comment - - edited

          Ramon Leon the issue is not the startup of the AWS instance.  The build is slow after the instance was started (see screenshot 2020-05-04%2016_31_11-Window.jpg). The first step is already running at the slave AWS instance, it's a little bit slower, but this is ok. But the second step (Preparing Build machine) is much more slower, and at this time, all steps you described were already performed!)

          Show
          dhoerner Daniel Hoerner added a comment - - edited Ramon Leon the issue is not the startup of the AWS instance.  The build is slow after the instance was started (see screenshot 2020-05-04%2016_31_11-Window.jpg). The first step is already running at the slave AWS instance, it's a little bit slower, but this is ok. But the second step (Preparing Build machine) is much more slower, and at this time, all steps you described were already performed!)

            People

            • Assignee:
              mramonleon Ramon Leon
              Reporter:
              dhoerner Daniel Hoerner
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: