Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50458

JNLP agent died while reconnecting to master with java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller

    Details

    • Type: Improvement
    • Status: Fixed but Unreleased (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: core
    • Environment:
    • Similar Issues:

      Description

      First agent is well started, and identicated on the master :

      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Locating server among [http://xxxxxxxxxx:8080/]
      mars 22, 2018 5:40:04 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFOS: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Agent discovery successful
        Agent address: xxxxxxxxxx
        Agent port:    9999
        Identity:      xxxxxxxxxx
      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Handshaking
      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connecting to topvm09.sesame.infotel.com:9999
      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Trying protocol: JNLP4-connect
      mars 22, 2018 5:40:04 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Remote identity confirmed: xxxxxxxxxx
      mars 22, 2018 5:40:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connected
      mars 22, 2018 5:40:06 PM com.youdevise.hudson.slavestatus.SlaveListener call
      INFOS: Slave-status listener starting
      mars 22, 2018 5:40:06 PM com.youdevise.hudson.slavestatus.SocketHTTPListener waitForConnection
      INFOS: Slave-status listener ready on port 3141
      

      Then master is unavailable (lots of OutOfMemory) and has been restarted.

      In the meantime, the JNLP agent try to reconnect to master until connection is OK:

      mars 28, 2018 1:49:25 PM hudson.slaves.ChannelPinger$1 onDead
      INFOS: Ping failed. Terminating the channel JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999.
      java.util.concurrent.TimeoutException: Ping started at 1522237525477 hasn't completed by 1522237765505
          at hudson.remoting.PingThread.ping(PingThread.java:134)
          at hudson.remoting.PingThread.run(PingThread.java:90)
      
      [... Repeated multiple times...]
      
      mars 28, 2018 2:26:45 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Terminated
      mars 28, 2018 2:26:45 PM hudson.util.ProcessTree getKillers
      AVERTISSEMENT: Failed to obtain killers
      hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999 failed. The channel is closing down or has closed down
          at hudson.remoting.Channel.call(Channel.java:945)
          at hudson.util.ProcessTree.getKillers(ProcessTree.java:159)
          at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:220)
          at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:436)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:146)
          at hudson.Proc$LocalProc.destroy(Proc.java:384)
          at hudson.Proc$LocalProc.join(Proc.java:357)
          at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1304)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:927)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:901)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:850)
          at hudson.remoting.UserRequest.perform(UserRequest.java:210)
          at hudson.remoting.UserRequest.perform(UserRequest.java:53)
          at hudson.remoting.Request$2.run(Request.java:364)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at hudson.remoting.Engine$1$1.run(Engine.java:94)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
          ... 4 more
      
      [... Repeated multiple times...]
      
      mars 28, 2018 2:26:46 PM hudson.remoting.Request$2 run
      AVERTISSEMENT: Failed to send back a reply to the request hudson.remoting.Request$2@34a893f6
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@71af25fc:JNLP4-connect connection to xxxxxxxxxx/192.168.2.98:9999": channel is already closed
          at hudson.remoting.Channel.send(Channel.java:715)
          at hudson.remoting.Request$2.run(Request.java:377)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at hudson.remoting.Engine$1$1.run(Engine.java:94)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
          ... 4 more
      
      mars 28, 2018 2:27:00 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
      
      [... Repeated multiple times...]
      
      mars 28, 2018 2:31:49 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
      mars 28, 2018 2:32:00 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
      mars 28, 2018 2:32:15 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out
      mars 28, 2018 2:32:30 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out
      mars 28, 2018 2:32:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
      
      

       

      But when the master is back, then the agent died with the following stacktrace :

      mars 28, 2018 2:32:50 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
      INFOS: Master isn't ready to talk to us on http://topvm09.sesame.infotel.com:8080/tcpSlaveAgentListener/. Will try again: response code=503
      mars 28, 2018 2:33:01 PM hudson.remoting.jnlp.Main$CuiListener error
      GRAVE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
          at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
          at hudson.remoting.Engine.innerRun(Engine.java:662)
          at hudson.remoting.Engine.run(Engine.java:469)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          ... 4 more
      
      

      Please note that changelog of 2.112 says remoting has been updated to 3.18, and I use previous version of agent.

      If agent version mismatch is the root cause, I would expect Jenkins to complains about the deprecated version of agent.

      PS : I don't known if this a "core" component issue.

        Attachments

          Activity

          Hide
          jtancer Jon Tancer added a comment -

          I downgraded the JDK on my machine from 10 to 8 and this problem went away.

          Show
          jtancer Jon Tancer added a comment - I downgraded the JDK on my machine from 10 to 8 and this problem went away.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Régis Maura Philipp Garbe Are you also using Java 10? If yes, it is not supported. Jenkins will run reliably only on Java 8

          Show
          oleg_nenashev Oleg Nenashev added a comment - Régis Maura Philipp Garbe Are you also using Java 10? If yes, it is not supported. Jenkins will run reliably only on Java 8
          Hide
          rmaura Régis Maura added a comment -

          Oleg Nenashev We are using Java 8 for both master and agent.
          Note : I have not tried to reproduce the bug since agent update to 3.19.

          Show
          rmaura Régis Maura added a comment - Oleg Nenashev We are using Java 8 for both master and agent. Note : I have not tried to reproduce the bug since agent update to 3.19.
          Hide
          jthompson Jeff Thompson added a comment -

          Régis Maura, it looks like this has been working fine for you so we should probably just close it.

          From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability.

          I see a couple of other similar reports JENKINS-50730 and JENKINS-52283 but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.--

          In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

          Show
          jthompson Jeff Thompson added a comment - Régis Maura , it looks like this has been working fine for you so we should probably just close it. From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability. I see a couple of other similar reports  JENKINS-50730  and  JENKINS-52283  but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.-- In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.
          Hide
          ashtontreadway Ashton Treadway added a comment -

          Per Oleg Nenashev, closing as resolved with no response from submitter.

          Show
          ashtontreadway Ashton Treadway added a comment - Per Oleg Nenashev , closing as resolved with no response from submitter.

            People

            • Assignee:
              jthompson Jeff Thompson
              Reporter:
              rmaura Régis Maura
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: