Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24758

Build crashes (network timout) with vSphere slaves

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • vsphere-cloud-plugin
    • None
    • jenkins: 1.580 and 1.578
      vsphere-cloud-1.1.11
      master: linux
      slaves: Vista and Seven

      Hello,

      I've vSphere slaves witch are "reverted" to a snapshot for each build. Those slaves are used to test the installation of our products.

      Some builds wait a long time at the beginning of the build before to crash with the following message:

      Remote build on slave-seven64 (x86_64 seven uiTest java7 autoRestored win x86)FATAL: channel is already closed
      hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:541)
      at hudson.remoting.Request.call(Request.java:129)
      at hudson.remoting.Channel.call(Channel.java:739)
      at hudson.EnvVars.getRemote(EnvVars.java:404)
      at hudson.model.Computer.getEnvironment(Computer.java:927)
      at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
      at hudson.model.Run.getEnvironment(Run.java:2248)
      at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:905)
      at hudson.matrix.MatrixRun$MatrixRunExecution.decideWorkspace(MatrixRun.java:175)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:517)
      at hudson.model.Run.execute(Run.java:1740)
      at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
      at hudson.model.ResourceController.execute(ResourceController.java:89)
      at hudson.model.Executor.run(Executor.java:240)
      Caused by: java.io.IOException
      at hudson.remoting.Channel.close(Channel.java:1027)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
      at hudson.remoting.PingThread.ping(PingThread.java:120)
      at hudson.remoting.PingThread.run(PingThread.java:81)
      Caused by: java.util.concurrent.TimeoutException: Ping started on 1411030796694 hasn't completed at 1411031036694
      ... 2 more

      The problem happens very often but it is not 100% reproducible. I've this problem with 2 different sets of computers. My slave configuration was working properly in a not so far past (about 6 months), but I don't know at witch jenkins update the problem appeared.

      Sometimes a blocked build is running in slave (according to the Jenkins UI) whereas the computer hosting the slave is down (according to the "vSphere Client").

      I also found the following message in a log of a running slave on which a build was waiting before to crash:

      sept. 18, 2014 3:30:59 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: ntedtop-seven64 is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: ntedtop-seven64 is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:286)
      at hudson.remoting.Engine.run(Engine.java:261)

      Is there a workaround for this problem.

      Regards,
      Grégoire

            Unassigned Unassigned
            gregoire_dupe Grégoire Dupé
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: