Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-16824

Builds aborting randomly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Master: Windows Server 2008 R2, Java 1.6.0_31, Java HotSpot(TM) 64-Bit Server VM. Slaves with diffrent OS versions, slave.jar running as service.

      The issue has been discussed at user mailing list last summer:

      https://groups.google.com/forum/#!msg/jenkinsci-users/GN0N4mqaCa4/kPbSal5xc4YJ

      We started having this issue last week, in a system with one job named "thejob", and about 50 slaves, many of them executing it in parallel at any given time. This seems to happen very randomly, usually in the time-consuming part of the build: running an executable on the slave, in which case stack trace is like this:

      12.2.2013 6:34:32 hudson.model.Run run
      INFO: thejob #443821 aborted
      java.lang.InterruptedException
      at java.lang.Object.wait(Native Method)
      at hudson.remoting.Request.call(Request.java:146)
      at hudson.remoting.Channel.call(Channel.java:664)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy39.join(Unknown Source)
      at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
      at hudson.Proc.joinWithTimeout(Proc.java:168)
      ... N more

      Link to source code of the wait getting interrupted in correct version of slave.jar: https://github.com/jenkinsci/remoting/blob/remoting-2.17/src/main/java/hudson/remoting/Request.java#L146

      This started to happen seemingly without any other cause except increasing load and slave count. Updating to 1.482.2. did not seem to have any effect. Jenkins master as well as slaves are running Windows, but that user mailing list thread suggests that this can happen on Linux too. We have not been able to identify anything special which might trigger this, but since it does not happen in small scale test setup, only in production, this complicates investigation.

      We tried -Xrs parameter for JVM, as suggested in that thread, and it seems to have reduced this, but has not stopped it completely.

            Unassigned Unassigned
            ari_hyttinen Ari Hyttinen
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: