Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5703

Exception leaves zombie processes for slaves started by command on master

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Similar Issues:

      Description

      I have several slaves that are started via the "Launch slave via execution of command on the Master." The command is a bash script that acquires Kerberos credentials and then ssh's over to the slave. Periodically, something happens that kills all the connections launched this way. Regardless, the original bash process is not collected by Hudson, and is left as a zombie or defunct process.

      The relevant excerpt from the hudson log appears to be:

      Feb 17, 2010 12:43:14 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel <host>
      java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
              at hudson.remoting.Channel$ReaderThread.run(Channel.java:852)
      

      I wonder if the fix is something as simple as adding a proc.destroy() to CommandLauncher.java?:

      @Override
      public void onClosed(Channel channel, IOException cause) {
          if (cause != null) {
              cause.printStackTrace(
                  listener.error(hudson.model.Messages.Slave_Terminated(getTimestamp())));
          }
          ProcessTree.get().killAll(proc, cookie);
          proc.destroy();
      }
      

        Attachments

          Activity

          Hide
          jsiirola jsiirola added a comment -

          Steps to reproduce (verified on 1.347 running on RHEL 5.4 within the embedded Winstone server):

          1. Configure a slave to start by running a command on the master
            • set the command to
              bash -c 'ssh localhost "cd ~/localhost; java -jar ~/localhost/slave.jar"'
          2. Allow the slave to connect
          3. Restart Hudson using /safeRestart


          This will orphan the ssh process, leaving it as "defunct" until the entire java server process is restarted.

          Show
          jsiirola jsiirola added a comment - Steps to reproduce (verified on 1.347 running on RHEL 5.4 within the embedded Winstone server): Configure a slave to start by running a command on the master set the command to bash -c 'ssh localhost "cd ~/localhost; java -jar ~/localhost/slave.jar"' Allow the slave to connect Restart Hudson using /safeRestart This will orphan the ssh process, leaving it as "defunct" until the entire java server process is restarted.
          Hide
          josesa Jose Sa added a comment -

          This is also happening to us with 1.432. All our slaves are "SSH slaves" and if there is some communication problem with the slaves that breaks the channel the slaves hang and sometimes get "defunct" processes

          Show
          josesa Jose Sa added a comment - This is also happening to us with 1.432. All our slaves are "SSH slaves" and if there is some communication problem with the slaves that breaks the channel the slaves hang and sometimes get "defunct" processes
          Hide
          josesa Jose Sa added a comment -

          Here is a more recent stack trace exception for updated line numbers:

          Sep 30, 2011 4:04:56 AM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel slave11
          java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1093)
          Caused by: java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087)
          
          Show
          josesa Jose Sa added a comment - Here is a more recent stack trace exception for updated line numbers: Sep 30, 2011 4:04:56 AM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel slave11 java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1093) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087)
          Hide
          marcomiller Marco Miller added a comment -

          Hi folks; did anyone notice this issue again lately, i.e., using later /more recent Jenkins releases? In other words, is this still an issue, to your opinion(s)? Big thx for letting me know =)

          Show
          marcomiller Marco Miller added a comment - Hi folks; did anyone notice this issue again lately, i.e., using later /more recent Jenkins releases? In other words, is this still an issue, to your opinion(s)? Big thx for letting me know =)
          Hide
          bwalding Ben Walding added a comment - - edited

          Even though I've done zero investigation into this - my initial thought would be that the slave should actually be launched with an exec in there:

          bash -c 'ssh localhost "cd ~/localhost; exec java -jar ~/localhost/slave.jar"'
          

          This can be tested by setting the advanced "JavaPath" setting to "exec java" (or whatever java executable you are using).

          Process list without exec
          ubuntu   22426 bash -c cd "/home/jenkins-slave" && java  -jar slave.jar
          ubuntu   22427 java -jar slave.jar
          
          Process list with exec
          ubuntu   22905 java -jar slave.jar
          

          Will this fix it? No idea, but having extra processes in the process chain is often a cause for failed signal handling and zombie processes.

          Show
          bwalding Ben Walding added a comment - - edited Even though I've done zero investigation into this - my initial thought would be that the slave should actually be launched with an exec in there: bash -c 'ssh localhost "cd ~/localhost; exec java -jar ~/localhost/slave.jar"' This can be tested by setting the advanced "JavaPath" setting to "exec java" (or whatever java executable you are using). Process list without exec ubuntu 22426 bash -c cd "/home/jenkins-slave" && java -jar slave.jar ubuntu 22427 java -jar slave.jar Process list with exec ubuntu 22905 java -jar slave.jar Will this fix it? No idea, but having extra processes in the process chain is often a cause for failed signal handling and zombie processes.

            People

            • Assignee:
              Unassigned
              Reporter:
              jsiirola jsiirola
            • Votes:
              5 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: