Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23705

EC2 slave disconnects on longer running builds

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • ec2-plugin
    • None
    • Jenkins Host: EC2 Ubuntu 14.04
      Slave: EC2 Ubuntu 14.04

      We have installed and successfully used the ec2 plugin today. However, we now notice that while smaller, quicker builds are working fine (and larger builds that fail due to compiler or JUnit test early on also work) larger, longer builds are consistently failing.

      The build proceeds as expected for an exact amount of time that differs per project (so for one project it always fails after exactly 6 min, for another it always fails after 12 min). At that point the build console stops and about 10 minutes later it shows a socket timeout exception. There does not seem to be any specific task that is running at the time the build hangs, I've seen it occur during cobertura report creation, during uploading of a jar file to artifactory, etc. On the agent machine itself the CPU usage drops to zero at the time of the hung build though the agent is definitely still running.

      To see if it was a memory issue I have increased both the java agent and the maven permgen and heap but it didn't make any difference. I also reduced executors to 1 to see if that helped but again no change.

      I have searched for any log files generated by the agent jar on the slave but haven't found any yet. If there is anything I can gather to help debug let me know.

      Sean Smith
      Senior Development Consultant
      Stella Technology

            francisu Francis Upton
            smsmithee Sean Smith
            Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: