Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5703

Exception leaves zombie processes for slaves started by command on master

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None

      Description

      I have several slaves that are started via the "Launch slave via execution of command on the Master." The command is a bash script that acquires Kerberos credentials and then ssh's over to the slave. Periodically, something happens that kills all the connections launched this way. Regardless, the original bash process is not collected by Hudson, and is left as a zombie or defunct process.

      The relevant excerpt from the hudson log appears to be:

      Feb 17, 2010 12:43:14 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel <host>
      java.io.EOFException
              at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
              at hudson.remoting.Channel$ReaderThread.run(Channel.java:852)
      

      I wonder if the fix is something as simple as adding a proc.destroy() to CommandLauncher.java?:

      @Override
      public void onClosed(Channel channel, IOException cause) {
          if (cause != null) {
              cause.printStackTrace(
                  listener.error(hudson.model.Messages.Slave_Terminated(getTimestamp())));
          }
          ProcessTree.get().killAll(proc, cookie);
          proc.destroy();
      }
      

        Activity

        Hide
        jsiirola jsiirola added a comment -

        Steps to reproduce (verified on 1.347 running on RHEL 5.4 within the embedded Winstone server):

        1. Configure a slave to start by running a command on the master
          • set the command to
            bash -c 'ssh localhost "cd ~/localhost; java -jar ~/localhost/slave.jar"'
        2. Allow the slave to connect
        3. Restart Hudson using /safeRestart


        This will orphan the ssh process, leaving it as "defunct" until the entire java server process is restarted.

        Show
        jsiirola jsiirola added a comment - Steps to reproduce (verified on 1.347 running on RHEL 5.4 within the embedded Winstone server): Configure a slave to start by running a command on the master set the command to bash -c 'ssh localhost "cd ~/localhost; java -jar ~/localhost/slave.jar"' Allow the slave to connect Restart Hudson using /safeRestart This will orphan the ssh process, leaving it as "defunct" until the entire java server process is restarted.
        Hide
        josesa Jose Sa added a comment -

        This is also happening to us with 1.432. All our slaves are "SSH slaves" and if there is some communication problem with the slaves that breaks the channel the slaves hang and sometimes get "defunct" processes

        Show
        josesa Jose Sa added a comment - This is also happening to us with 1.432. All our slaves are "SSH slaves" and if there is some communication problem with the slaves that breaks the channel the slaves hang and sometimes get "defunct" processes
        Hide
        josesa Jose Sa added a comment -

        Here is a more recent stack trace exception for updated line numbers:

        Sep 30, 2011 4:04:56 AM hudson.remoting.Channel$ReaderThread run
        SEVERE: I/O error in channel slave11
        java.io.IOException: Unexpected termination of the channel
        	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1093)
        Caused by: java.io.EOFException
        	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
        	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
        	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087)
        
        Show
        josesa Jose Sa added a comment - Here is a more recent stack trace exception for updated line numbers: Sep 30, 2011 4:04:56 AM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel slave11 java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1093) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087)
        Hide
        marcomiller Marco Miller added a comment -

        Hi folks; did anyone notice this issue again lately, i.e., using later /more recent Jenkins releases? In other words, is this still an issue, to your opinion(s)? Big thx for letting me know =)

        Show
        marcomiller Marco Miller added a comment - Hi folks; did anyone notice this issue again lately, i.e., using later /more recent Jenkins releases? In other words, is this still an issue, to your opinion(s)? Big thx for letting me know =)

          People

          • Assignee:
            Unassigned
            Reporter:
            jsiirola jsiirola
          • Votes:
            4 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated: