Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59579

EC2 Plugin stops slave when build is running

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Not A Defect
    • Component/s: ec2-plugin, swarm-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.187
      Amazon EC2 1.44.1
      Swarm 3.13
    • Similar Issues:

      Description

      I have set up the connection between Jenkins and AWS via Amazon EC2 plugin. Jenkins master cloud config: 

       

      The node connects via the Amazon plugin and then creates a new connection via Swarm plugin and the job ends up running on the connection made through swarm. (This is because my jobs include TestComplete & FlaUI and winRM is not quite suited for their requirements).

       

      Jobs that take under 25 min run successfully, anything that goes over 25-26 min fails with the following:

       Slave log:

      12:49:46 java.io.IOException: Backing channel 'JNLP4-connect connection from 10.230.0.101/10.230.0.101:49724' is disconnected.
      12:49:46 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
      12:49:46 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
      12:49:46 	at com.sun.proxy.$Proxy89.isAlive(Unknown Source)
      12:49:46 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1172)
      12:49:46 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1164)
      12:49:46 	at hudson.Launcher$ProcStarter.join(Launcher.java:492)
      12:49:46 	at hudson.plugins.gradle.Gradle.performTask(Gradle.java:333)
      12:49:46 	at hudson.plugins.gradle.Gradle.perform(Gradle.java:225)
      12:49:46 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      12:49:46 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
      12:49:46 	at hudson.model.Build$BuildExecution.build(Build.java:206)
      12:49:46 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
      12:49:46 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
      12:49:46 	at hudson.model.Run.execute(Run.java:1815)
      12:49:46 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      12:49:46 	at hudson.model.ResourceController.execute(ResourceController.java:97)
      12:49:46 	at hudson.model.Executor.run(Executor.java:429)
      12:49:46 Caused by: java.nio.channels.ClosedChannelException
      12:49:46 	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      12:49:46 	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
      12:49:46 	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
      12:49:46 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      12:49:46 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      12:49:46 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      12:49:46 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      12:49:46 	at java.lang.Thread.run(Thread.java:748)
      

      On the master's log I can see:

      Idle timeout of EC2 (Itiviti AWS) - Windows Jenkins node autoconnecting to deb-jenkins-prd using Swarm plugin (i-000908b57bb5d82a7) after 30 idle minutes, instance statusRUNNING
      Sep 30, 2019 8:40:45 AM INFO hudson.plugins.ec2.EC2AbstractSlave idleTimeout
      EC2 instance idle time expired: i-000908b57bb5d82a7
      Sep 30, 2019 8:40:46 AM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
      Terminated EC2 instance (terminated): i-000908b57bb5d82a7
      Sep 30, 2019 8:40:46 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
      IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/172.17.0.2:40440 remote=10.230.0.71/10.230.0.71:49735]] / Computer.threadPoolForRemoting [#85772] for ec2amaz-glc1084 terminated: java.nio.channels.ClosedChannelException
      Sep 30, 2019 8:40:46 AM INFO hudson.model.Run execute
      aws-ul-trader-extension-master-desk-uitests-listorders #22 main build action completed: FAILURE
      Sep 30, 2019 8:40:46 AM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
      Removed EC2 instance from jenkins master: i-000908b57bb5d82a7
      

      After that period of time the slave is disconnected even though the build was running on it. Any help in tracking down the problem is much appreciated!

        Attachments

        1. Capture.PNG
          Capture.PNG
          56 kB
        2. Capture2.PNG
          Capture2.PNG
          44 kB
        3. Capture3.PNG
          Capture3.PNG
          53 kB

          Activity

          Hide
          ayambarshev Alexey Yambarshev added a comment -

          I have the same issue with Jenkins 2.222.4 and EC2 plugin 1.50.3.
          Could anyone please explain why it's not a defect.

          Show
          ayambarshev Alexey Yambarshev added a comment - I have the same issue with Jenkins 2.222.4 and EC2 plugin 1.50.3. Could anyone please explain why it's not a defect.
          Hide
          gcimpoies George Cimpoies added a comment -

          Alexey Yambarshev in my case it was because i had both EC2 plugin and Swarm plugin installed, so jenkins would spawn 2 connections (one through ec2 plugin and one through swarm plugin) and it would execute the build on the swarm one, while the ec2 one was idle. Due to the timeout set on my jobs, the ec2 connection would be terminated, effectively removing the node (and breaking the working swarm connection as well). I'd try to investigate if your builds are always interrupted after a fixed period of time.

          Show
          gcimpoies George Cimpoies added a comment - Alexey Yambarshev in my case it was because i had both EC2 plugin and Swarm plugin installed, so jenkins would spawn 2 connections (one through ec2 plugin and one through swarm plugin) and it would execute the build on the swarm one, while the ec2 one was idle. Due to the timeout set on my jobs, the ec2 connection would be terminated, effectively removing the node (and breaking the working swarm connection as well). I'd try to investigate if your builds are always interrupted after a fixed period of time.
          Hide
          gcimpoies George Cimpoies added a comment -

          The solution in my case was to only use ec2 via ssh (NOT winRM) and get rid of swarm

          Show
          gcimpoies George Cimpoies added a comment - The solution in my case was to only use ec2 via ssh (NOT winRM) and get rid of swarm

            People

            • Assignee:
              thoulen FABRIZIO MANFREDI
              Reporter:
              gcimpoies George Cimpoies
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: