Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27046

Restarted the master and all slaves went offline

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Environment:
    • Similar Issues:

      Description

      We restarted the master and all the slaves (330 of them) went offline. We were able to bring back several slaves manually and through a job that restarts slaves remotely. Please review both the error logs (master and slave) attached.

      [Posting parts of the error log here for making it easier for others to search for the same problem.]

      On the master:
      Caused by: java.lang.OutOfMemoryError: unable to create new native thread

      On the slave:
      WARNING hudson.remoting.AbstractByteArrayCommandTransport
      Failed to construct Command
      java.io.EOFException
      at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)

        Attachments

          Activity

          Hide
          adityai Aditya Inapurapu added a comment -

          We restarted the master again after removing a few plug-ins and we have a new error message. The previous EOF related error is not in the log anymore.

          Show
          adityai Aditya Inapurapu added a comment - We restarted the master again after removing a few plug-ins and we have a new error message. The previous EOF related error is not in the log anymore.
          Hide
          adityai Aditya Inapurapu added a comment -

          The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. We increased this to 3gb and the issue did not reoccur. We changed the priority of this defect to minor.

          We will leave the ticket open and so that some enhancements to Jenkins can be made using this information. Perhaps Jenkins could check on the physical memory allocation on start up or restart.

          Show
          adityai Aditya Inapurapu added a comment - The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. We increased this to 3gb and the issue did not reoccur. We changed the priority of this defect to minor. We will leave the ticket open and so that some enhancements to Jenkins can be made using this information. Perhaps Jenkins could check on the physical memory allocation on start up or restart.
          Hide
          danielbeck Daniel Beck added a comment -

          The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs.

          Disk space or RAM? This looks more like a limitation of your specific environment than any sensible vendor default.

          Show
          danielbeck Daniel Beck added a comment - The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. Disk space or RAM? This looks more like a limitation of your specific environment than any sensible vendor default.
          Hide
          adityai Aditya Inapurapu added a comment - - edited

          Daniel,
          Yes, this was caused by a limitation in the environment. At the same time, could Jenkins possibly add a more appropriate log entry like 'Cannot start more slaves. Memory allocation of user is too low.'? An error like 'java.io.EOFException', 'Trying to unexport an object that's already unexported', 'ERROR: Connection terminated' did not lead us in the right direction.

          Show
          adityai Aditya Inapurapu added a comment - - edited Daniel, Yes, this was caused by a limitation in the environment. At the same time, could Jenkins possibly add a more appropriate log entry like 'Cannot start more slaves. Memory allocation of user is too low.'? An error like 'java.io.EOFException', 'Trying to unexport an object that's already unexported', 'ERROR: Connection terminated' did not lead us in the right direction.
          Hide
          danielbeck Daniel Beck added a comment -

          Slave log says that the following is the original error:

          Caused by: java.lang.OutOfMemoryError: unable to create new native thread

          (Java exceptions are always read bottom to top, which is weird but a well-known convention)

          And in that situation, everything's horribly broken. Doing nicer error reporting is probably not worth the effort.

          Show
          danielbeck Daniel Beck added a comment - Slave log says that the following is the original error: Caused by: java.lang.OutOfMemoryError: unable to create new native thread (Java exceptions are always read bottom to top, which is weird but a well-known convention) And in that situation, everything's horribly broken. Doing nicer error reporting is probably not worth the effort.

            People

            • Assignee:
              Unassigned
              Reporter:
              adityai Aditya Inapurapu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: