Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-15199

Slaves turns to "Dead" state after connecting to remote machine

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: core, remoting
    • Environment:
      Linux 2.6.27.42-0.1-default #1 SMP 2010-01-06 16:07:25 +0100 x86_64 x86_64 x86_64 GNU/Linux
      Distributor ID: SUSE LINUX
      Description: SUSE Linux Enterprise Desktop 11 (x86_64)
      Release: 11
      Codename: n/a
    • Similar Issues:

      Description

      Hi,

      I'm trying to run a slave on a remote computer, but it isn't working. For some reason the slave fails after the link has been established between the master and remote nodes.

      The error message looks like this:
      java.lang.NoClassDefFoundError: hudson/model/Run$RunExecution
      at java.lang.Class.getDeclaredConstructors0(Native Method)
      at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
      at java.lang.Class.getConstructor0(Class.java:2699)
      at java.lang.Class.getConstructor(Class.java:1657)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:944)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1159)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:129)
      at hudson.model.Executor.run(Executor.java:214)
      Caused by: java.lang.ClassNotFoundException: hudson.model.Run$RunExecution
      at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      ... 8 more

      Any idea about what might be causing this malfunctioning?

      Best Regards,
      Carlos

        Attachments

        1. error.jpg
          error.jpg
          111 kB
        2. slave-configuration.png
          slave-configuration.png
          48 kB
        3. slave-startup.txt
          2 kB

          Issue Links

            Activity

            Hide
            stewart Stewart Smith added a comment -

            I've gotten this too (on most of our ssh launched slaves):

            java.lang.NullPointerException
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1197)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:136)
            at hudson.model.Executor.run(Executor.java:211)

            more info

            Show
            stewart Stewart Smith added a comment - I've gotten this too (on most of our ssh launched slaves): java.lang.NullPointerException at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1197) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:136) at hudson.model.Executor.run(Executor.java:211) more info
            Hide
            stewart Stewart Smith added a comment -

            Problem disappeared when downgrading to 1.482

            Show
            stewart Stewart Smith added a comment - Problem disappeared when downgrading to 1.482
            Hide
            carlosandre Carlos André added a comment -

            I'll try this as workaround.

            Thanks a lot.

            Show
            carlosandre Carlos André added a comment - I'll try this as workaround. Thanks a lot.
            Hide
            steven_aerts Steven Aerts added a comment -

            Set assignee to Automatic so some of the core developers can take this up.

            Show
            steven_aerts Steven Aerts added a comment - Set assignee to Automatic so some of the core developers can take this up.
            Hide
            steven_aerts Steven Aerts added a comment - - edited

            We are seeing this problem on jenkins version 1.486.
            We see exact the same stacktrace as Stewart.

            We see this only on nodes coupled with matrix jobs.
            Some of our matrix jobs we are not able to run anymore. Whenever one of those matrix jobs is started its node crashes with this NPE exception.

            It is unclear for us what the discriminator is which makes a matrix job faulty.

            Doe anyone have an idea what could cause these matrix jobs to crash.

            Show
            steven_aerts Steven Aerts added a comment - - edited We are seeing this problem on jenkins version 1.486. We see exact the same stacktrace as Stewart . We see this only on nodes coupled with matrix jobs. Some of our matrix jobs we are not able to run anymore. Whenever one of those matrix jobs is started its node crashes with this NPE exception. It is unclear for us what the discriminator is which makes a matrix job faulty. Doe anyone have an idea what could cause these matrix jobs to crash.
            Hide
            steven_aerts Steven Aerts added a comment -

            I have found a reproduction scenario for this bug.

            1. Define a matrix job which runs a few jobs (which takes some time) on a specific node
            2. Run this job, which will spawn a few matrix jobs
            3. Do a safeRestart of jenkins, this will persist some of the matrix jobs which are still waiting in the queue

            When jenkins comes up now, it will try to start a matrix job from the persisted queue, and this will fail with the above NPE exception.

            A quick workaround is removing/deleting the matrix jobs from the queue after jenkins is restarted. This allows you to restart the dead clients again.

            Show
            steven_aerts Steven Aerts added a comment - I have found a reproduction scenario for this bug. Define a matrix job which runs a few jobs (which takes some time) on a specific node Run this job, which will spawn a few matrix jobs Do a safeRestart of jenkins, this will persist some of the matrix jobs which are still waiting in the queue When jenkins comes up now, it will try to start a matrix job from the persisted queue, and this will fail with the above NPE exception. A quick workaround is removing/deleting the matrix jobs from the queue after jenkins is restarted. This allows you to restart the dead clients again.
            Hide
            brian Brian Murrell added a comment - - edited

            Still no resolution to this issue?

            It would be really nice to fix this issue for all of the people who are going to be upgrading and run into this issue and have to apply the workaround in the previous comment.

            Even a mention in https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2013-01-04 of the issue and work-around would be useful instead of people wasting time as I did trying to figure out what the problem is.

            Show
            brian Brian Murrell added a comment - - edited Still no resolution to this issue? It would be really nice to fix this issue for all of the people who are going to be upgrading and run into this issue and have to apply the workaround in the previous comment. Even a mention in https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2013-01-04 of the issue and work-around would be useful instead of people wasting time as I did trying to figure out what the problem is.
            Hide
            raada Magnus Larsson added a comment -

            I also get the error message in 1.498:
            Thread has died

            java.lang.NullPointerException
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1266)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:138)
            at hudson.model.Executor.run(Executor.java:211)

            I can reproduce by starting a slave on a node (works) and then start a second slave on the same node (fails).
            I'm using SSH.

            Show
            raada Magnus Larsson added a comment - I also get the error message in 1.498: Thread has died java.lang.NullPointerException at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1266) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:138) at hudson.model.Executor.run(Executor.java:211) I can reproduce by starting a slave on a node (works) and then start a second slave on the same node (fails). I'm using SSH.
            Hide
            raada Magnus Larsson added a comment -

            Additional data:
            Slave successfully connected and online
            ERROR: Connection terminated
            java.io.IOException: Unexpected termination of the channel
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
            at hudson.remoting.Command.readFrom(Command.java:92)
            at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            [01/16/13 09:05:06] [SSH] Connection closed.
            ERROR: [01/16/13 09:05:06] slave agent was terminated
            java.io.IOException: Unexpected termination of the channel
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
            at hudson.remoting.Command.readFrom(Command.java:92)
            at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

            Show
            raada Magnus Larsson added a comment - Additional data: Slave successfully connected and online ERROR: Connection terminated java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) [01/16/13 09:05:06] [SSH] Connection closed. ERROR: [01/16/13 09:05:06] slave agent was terminated java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            Hide
            raada Magnus Larsson added a comment - - edited

            It seems to work for me now.
            My "SSH Slave Plugin" was out-of-date (v2.1). I upgraded to 2.2 and my Dead slave nodes went away. I'm now running v1.499 with no problem.

            Show
            raada Magnus Larsson added a comment - - edited It seems to work for me now. My "SSH Slave Plugin" was out-of-date (v2.1). I upgraded to 2.2 and my Dead slave nodes went away. I'm now running v1.499 with no problem.
            Hide
            taksan taksan added a comment - - edited

            We are having this problem as well on v1.499. I already installed the latest ssh slave plugin, but it didn't work.
            At least, the workaround proposed by Aerts (removing matrix jobs from the queue and restarting the slave thread) worked.

            Show
            taksan taksan added a comment - - edited We are having this problem as well on v1.499. I already installed the latest ssh slave plugin, but it didn't work. At least, the workaround proposed by Aerts (removing matrix jobs from the queue and restarting the slave thread) worked.
            Hide
            ochedru Olivier Chédru added a comment -

            I have the same issue without matrix jobs in the queue, and the slave is a Windows machine started through JNLP.

            Show
            ochedru Olivier Chédru added a comment - I have the same issue without matrix jobs in the queue, and the slave is a Windows machine started through JNLP.
            Hide
            michaelgang David Gang added a comment -

            I have the same problem with jenkins 1.513 and slave plugin 2.17

            Show
            michaelgang David Gang added a comment - I have the same problem with jenkins 1.513 and slave plugin 2.17
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Most probably, the issue has been fixed by the classloading fix (see JENKINS-19453).
            Please re-open if it appears on newest versions

            Show
            oleg_nenashev Oleg Nenashev added a comment - Most probably, the issue has been fixed by the classloading fix (see JENKINS-19453 ). Please re-open if it appears on newest versions
            Hide
            dam321 sam sa added a comment -

            hi carlos did you find a solution to jenkins pages refreshing every few seconds ?

            Show
            dam321 sam sa added a comment - hi carlos did you find a solution to jenkins pages refreshing every few seconds ?
            Hide
            carlosandre Carlos André added a comment -

            Hi Sam,

            I haven't been using Jenkins for a couple of years, but as Oleg mentioned before, it is most likely solved by the classloading fix (see JENKINS-19453).

            Show
            carlosandre Carlos André added a comment - Hi Sam, I haven't been using Jenkins for a couple of years, but as Oleg mentioned before, it is most likely solved by the classloading fix (see JENKINS-19453 ).

              People

              • Assignee:
                kohsuke Kohsuke Kawaguchi
                Reporter:
                carlosandre Carlos André
              • Votes:
                7 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: