Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-8473

Handle slave death and retry job somewhere else

XMLWordPrintable

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • other
    • None

      I have a lot of user workstation used as slaves in Hudson.
      And I have big compile "task" splitted in severals steps AKA jobs.

      Thanks to this I can speed up the global compile time

      But when a slave dies (for exemple "system shutdown") if a job was running on it, it fails.
      And the whole process is lost.

      FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.call(Request.java:137)
      at hudson.remoting.Channel.call(Channel.java:629)
      at hudson.FilePath.act(FilePath.java:745)
      at hudson.FilePath.act(FilePath.java:738)
      at hudson.FilePath.mkdirs(FilePath.java:804)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1116)
      at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:480)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:412)
      at hudson.model.Run.run(Run.java:1362)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:145)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:257)
      at hudson.remoting.Channel.terminate(Channel.java:680)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:971)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:953)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:947)

      That would be nice if Hudson detects that the slave died and then it submit the job to another slave "quietly".

      • Without incrementing the build number
      • Without triggering the downstream jobs

      In other words, if a slave die its jobs are re-launched silently on other slaves.

            Unassigned Unassigned
            ebann ebann
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: