Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50730

NoClassDefFound errors in Cloud Slaves

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      Remoting 3.28

      Description

      While provisioning slaves from a private Kubernetes instance, we've found that a lot of slaves terminate with the following (or similar) stack trace on the slave's side:

       

      INFO: Setting up slave: kube1-medium-r9zf4
      Apr 10, 2018 11:02:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Apr 10, 2018 11:02:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/<user>/workDir/remoting as a remoting work directory
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server ...
      Apr 10, 2018 11:02:06 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful <...>
      pr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to <Jenkins Master>
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: <...>
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Apr 10, 2018 11:02:14 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Apr 10, 2018 11:02:14 AM hudson.remoting.UserRequest perform
      WARNING: LinkageError while performing UserRequest:jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2@3e708317
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller$2$1
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:71)
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:53)
              at hudson.remoting.UserRequest.perform(UserRequest.java:207)
              at hudson.remoting.UserRequest.perform(UserRequest.java:53)
              at hudson.remoting.Request$2.run(Request.java:358)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at hudson.remoting.Engine$1$1.run(Engine.java:98)
              at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1
              at java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              ... 11 more
      
      

      The class that appears to not have been found isn't consistently the same. I've seen `FilePathFilter`, `LaunchConfiguration`, `StringBuilderWriter`, and some others being reported as well. Sometimes, there's also exceptions related to `JarCacheSupport` not being able to resolve jars (I don't have the exact stacktrace at hand - will post it if I find it again).

      On the master's side, these exceptions generally manifest as `ChannelClosedException`s, or weird Exception-less failures in pipeline branches.

      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      remote file operation failed: /home/<user>/workspace/<job_name> at hudson.remoting.Channel@6639429c:JNLP4-connect connection from <some-host-name>/<ip-address>:60326: hudson.remoting.ChannelClosedException: Remote call on JNLP4-connect connection from <some-host-name>/<ip-address>:60326 failed. The channel is closing down or has closed down
      

      I haven't been able to consistently reproduce the error, but it does manifest enough to be causing major pain to users (especially since we extensively use pipelines with a large number of parallel nodes, and a failure in any one of the nodes causes the entire pipeline to fail).

        Attachments

          Issue Links

            Activity

            karthikduddu Karthik Duddu created issue -
            karthikduddu Karthik Duddu made changes -
            Field Original Value New Value
            Assignee Carlos Sanchez [ csanchez ] Oleg Nenashev [ oleg_nenashev ]
            oleg_nenashev Oleg Nenashev made changes -
            Assignee Oleg Nenashev [ oleg_nenashev ] Jeff Thompson [ jthompson ]
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Karthik Duddu Sorry, I am not going to review it soon. Jeff Thompson is now responsible for triaging Remoting-related issues

            Show
            oleg_nenashev Oleg Nenashev added a comment - Karthik Duddu Sorry, I am not going to review it soon. Jeff Thompson is now responsible for triaging Remoting-related issues
            Hide
            manu86 Emmanuel Costa added a comment - - edited

            We are observing the same bug on our production system,

            2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms
            
            16:16:34.028 INFO - Selenium Server is up and running
            
            Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
            
            INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Agent discovery successful
            
              Agent address: jenkins.XXXX.XXX
            
              Agent port:    8888
            
              Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Handshaking
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Connecting to jenkins.XXX.XX:8888
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Trying protocol: JNLP4-connect
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
            
            Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Connected
            
            
            
            Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status
            
            INFO: Terminated
            
            Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error
            
            SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
            
            java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
            
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
            
            at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
            
            at hudson.remoting.Engine.innerRun(Engine.java:643)
            
            at hudson.remoting.Engine.run(Engine.java:451)
            
            Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
            
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            
            at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157)
            
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            
            ... 4 more
            Show
            manu86 Emmanuel Costa added a comment - - edited We are observing the same bug on our production system, 2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms 16:16:34.028 INFO - Selenium Server is up and running Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful   Agent address: jenkins.XXXX.XXX   Agent port:    8888   Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.XXX.XX:8888 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:643) at hudson.remoting.Engine.run(Engine.java:451) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157) at java.lang. ClassLoader .loadClass( ClassLoader .java:424) at java.lang. ClassLoader .loadClass( ClassLoader .java:357) ... 4 more
            Hide
            luke_hopkins Luke Hopkins added a comment -

            We are having similar issues.  On slave we see

            INFO: Connected
            Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Terminated
            Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error
            SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
            java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
                at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
                at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
                at hudson.remoting.Engine.innerRun(Engine.java:662)
                at hudson.remoting.Engine.run(Engine.java:469)
            Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
                at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
                at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            

            We also see in the logs of the job these errors.

            java.io.IOException: remote file operation failed
            
            Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException
            Show
            luke_hopkins Luke Hopkins added a comment - We are having similar issues.  On slave we see INFO: Connected Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller     at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)     at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)     at hudson.remoting.Engine.innerRun(Engine.java:662)     at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)     at java.lang. ClassLoader .loadClass( ClassLoader .java:424)     at java.lang. ClassLoader .loadClass( ClassLoader .java:357) We also see in the logs of the job these errors. java.io.IOException: remote file operation failed Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException
            Hide
            karthikduddu Karthik Duddu added a comment -

            Oleg Nenashev Jeff Thompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories.

            Do you guys have any initial ideas or starting points that we can work off of?

            Show
            karthikduddu Karthik Duddu added a comment - Oleg Nenashev Jeff Thompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories. Do you guys have any initial ideas or starting points that we can work off of?
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal FNDN-235 (Web Link)" [ 20775 ]
            Hide
            jthompson Jeff Thompson added a comment -

            Karthik Duddu, Emmanuel Costa, Luke Hopkins: Do you continue to see this issue?

            From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability.

            I see a couple of other similar reports JENKINS-50458 and JENKINS-52283 but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.--

            In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

            Show
            jthompson Jeff Thompson added a comment - Karthik Duddu , Emmanuel Costa , Luke Hopkins : Do you continue to see this issue? From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability. I see a couple of other similar reports JENKINS-50458 and JENKINS-52283  but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.-- In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.
            Hide
            vchijwani Vicky Chijwani added a comment -

            Jeff Thompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc).

            We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.

            Show
            vchijwani Vicky Chijwani added a comment - Jeff Thompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc). We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.
            Hide
            jthompson Jeff Thompson added a comment -

            Vicky Chijwani, thanks for the reply. I'm going to leave this open for a couple more days and see if anyone can provide further details, otherwise I'll mark it as Cannot Reproduce and close it.

            Show
            jthompson Jeff Thompson added a comment - Vicky Chijwani , thanks for the reply. I'm going to leave this open for a couple more days and see if anyone can provide further details, otherwise I'll mark it as Cannot Reproduce and close it.
            Hide
            alonlavi Alon Lavi added a comment -

            Jeff Thompson, please don't close this issue. We're having the same problem. I still didn't figure out a way to reproduce, but it's really bothering.

            Show
            alonlavi Alon Lavi added a comment - Jeff Thompson , please don't close this issue. We're having the same problem. I still didn't figure out a way to reproduce, but it's really bothering.
            Hide
            jthompson Jeff Thompson added a comment -

            Alon Lavi, there are different opinions on how issue reports like this should be handled. I tend to follow the approach that if it cannot be sufficiently described and reproduced then it should be closed as cannot reproduce and then re-opened if someone obtains better information. Particularly when a significant number of the reporters have seen the problem go away from environmental or version changes.

            But, let's keep this one open for a while longer and see if we get any better information.

            Show
            jthompson Jeff Thompson added a comment - Alon Lavi , there are different opinions on how issue reports like this should be handled. I tend to follow the approach that if it cannot be sufficiently described and reproduced then it should be closed as cannot reproduce and then re-opened if someone obtains better information. Particularly when a significant number of the reporters have seen the problem go away from environmental or version changes. But, let's keep this one open for a while longer and see if we get any better information.
            Hide
            jthompson Jeff Thompson added a comment -

            I saw this ClassNotFoundException yesterday. It was one of the last things in a string of exceptions and messages in my agent logs on my Windows 10 system when it went to sleep. I couldn't see that it caused any failures or problems. Things started up without any problem when the system woke up.

            This might not have any relation to other instances when people are seeing this exception. However given other reports, this might not be causing any problems but may be following on from other problems. Perhaps the real problem is elsewhere (environmental or configuration as noted by many, including my experience) and this particular message is a distraction.

            Show
            jthompson Jeff Thompson added a comment - I saw this ClassNotFoundException yesterday. It was one of the last things in a string of exceptions and messages in my agent logs on my Windows 10 system when it went to sleep. I couldn't see that it caused any failures or problems. Things started up without any problem when the system woke up. This might not have any relation to other instances when people are seeing this exception. However given other reports, this might not be causing any problems but may be following on from other problems. Perhaps the real problem is elsewhere (environmental or configuration as noted by many, including my experience) and this particular message is a distraction.
            Hide
            jthompson Jeff Thompson added a comment -

            Since the original reporter no longer observes this issue and no one has provided additional information to describe, diagnose, or reproduce this issue in quite some time I am going to close this report down as Cannot Reproduce. If anyone can provide further reports and information that could move this forward we can re-open it at that time.

            Show
            jthompson Jeff Thompson added a comment - Since the original reporter no longer observes this issue and no one has provided additional information to describe, diagnose, or reproduce this issue in quite some time I am going to close this report down as Cannot Reproduce. If anyone can provide further reports and information that could move this forward we can re-open it at that time.
            jthompson Jeff Thompson made changes -
            Status Open [ 1 ] Closed [ 6 ]
            Resolution Cannot Reproduce [ 5 ]
            Hide
            tcoan Thomas COAN added a comment - - edited

            I face same problem for a couple of days now. 

            The slave agent is launched with following command 

            java -jar agent.jar -jnlpUrl https://MY_MASTER_HOST:MY_MASTER_PORT/computer/slave-number1/slave-agent.jnlp -secret ***** -workDir "/var/lib/jenkins"

            It was working fine for monthes, but i don't know what has changed (I don't remember any updates or modification on master or slave)

            Slave works during several minutes/hours and dies again with the following error :

            INFO: Connecting to MY_MASTER_HOST:MY_MASTER_PORT
            Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Trying protocol: JNLP4-connect
            Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Remote identity confirmed: 47:2d:42:ae:61:c6:79:b1:69:79:27:d7:26:b8:15:ec
            Oct 24, 2018 10:36:36 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Connected
            Oct 24, 2018 10:51:36 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Terminated
            Oct 24, 2018 10:51:46 AM hudson.remoting.jnlp.Main$CuiListener error
            SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
            java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
            at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
            at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
            at hudson.remoting.Engine.innerRun(Engine.java:662)
            at hudson.remoting.Engine.run(Engine.java:469)
            Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 4 more

            Config : 

            • java : 1.8.0_181
            • os : ubuntu 16.04.4 LTS (Xenial Xerus)
            • jenkins master version : 2.138.2

             

            Show
            tcoan Thomas COAN added a comment - - edited I face same problem for a couple of days now.  The slave agent is launched with following command  java -jar agent.jar -jnlpUrl https://MY_MASTER_HOST:MY_MASTER_PORT/computer/slave-number1/slave-agent.jnlp -secret ***** -workDir "/var/lib/jenkins" It was working fine for monthes, but i don't know what has changed (I don't remember any updates or modification on master or slave) Slave works during several minutes/hours and dies again with the following error : INFO: Connecting to MY_MASTER_HOST:MY_MASTER_PORT Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 47:2d:42:ae:61:c6:79:b1:69:79:27:d7:26:b8:15:ec Oct 24, 2018 10:36:36 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Oct 24, 2018 10:51:36 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Oct 24, 2018 10:51:46 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:662) at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more Config :   java : 1.8.0_181 os : ubuntu 16.04.4 LTS (Xenial Xerus) jenkins master version : 2.138.2  
            tcoan Thomas COAN made changes -
            Resolution Cannot Reproduce [ 5 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            Hide
            jthompson Jeff Thompson added a comment -

            As far as I've been able to tell so, this is not a cause of failures or even a direct symptom. It results after an earlier connection failure and just indicates that a reconnection attempt has failed.

            The stack traces all begin with a failure in onReconnect(). This indicates the connection has terminated and Remoting is attempting to reconnect. Unfortunately, the reconnect attempt fails and this message results.

            Connection failures and disconnects can happen for many reasons, commonly associated with system, network, or environment issues. Particularly if nothing has changed and this suddenly starts occurring it is likely to be one of these external causes.

            It might be possible to improve the retry logic but without further information on what is occurring it is difficult to know what changes might improve it.

            One thing I could easily do is to change the log messaging and downgrade the severity of this particular message.

            Show
            jthompson Jeff Thompson added a comment - As far as I've been able to tell so, this is not a cause of failures or even a direct symptom. It results after an earlier connection failure and just indicates that a reconnection attempt has failed. The stack traces all begin with a failure in onReconnect(). This indicates the connection has terminated and Remoting is attempting to reconnect. Unfortunately, the reconnect attempt fails and this message results. Connection failures and disconnects can happen for many reasons, commonly associated with system, network, or environment issues. Particularly if nothing has changed and this suddenly starts occurring it is likely to be one of these external causes. It might be possible to improve the retry logic but without further information on what is occurring it is difficult to know what changes might improve it. One thing I could easily do is to change the log messaging and downgrade the severity of this particular message.
            Hide
            tcoan Thomas COAN added a comment -

            Thanks Jeff,

            Yes changing the log message would be great. 

             

            => As a workaround, I have changes the way of connecting the slave using now the ssh agent method instead of java web start.

            There are no connection failure from slave to master anymore for a couple of days

            Thomas 

            Show
            tcoan Thomas COAN added a comment - Thanks Jeff, Yes changing the log message would be great.    => As a workaround, I have changes the way of connecting the slave using now the ssh agent  method instead of java web start. There are no connection failure from slave to master anymore for a couple of days Thomas 
            Hide
            jthompson Jeff Thompson added a comment -

            Thomas COAN, I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295   If you have any comments on that proposal, please share.

            That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.

            Show
            jthompson Jeff Thompson added a comment - Thomas COAN , I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295    If you have any comments on that proposal, please share. That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.
            Hide
            tcoan Thomas COAN added a comment -

            Jeff Thompson, I have reviewed the PR, it seems good to me. Thanks Jeff.

            As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue. 

            With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.

             

            Show
            tcoan Thomas COAN added a comment - Jeff Thompson , I have reviewed the PR, it seems good to me. Thanks Jeff. As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue.  With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.  
            Hide
            jthompson Jeff Thompson added a comment -

            Thomas COAN, if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother. 

            I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.

            Show
            jthompson Jeff Thompson added a comment - Thomas COAN , if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother.  I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.
            Hide
            jthompson Jeff Thompson added a comment -

            As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations.

            This will be picked up by a Jenkins weekly build soon.

            Show
            jthompson Jeff Thompson added a comment - As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations. This will be picked up by a Jenkins weekly build soon.
            jthompson Jeff Thompson made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Released As Remoting 3.28

              People

              • Assignee:
                jthompson Jeff Thompson
                Reporter:
                karthikduddu Karthik Duddu
              • Votes:
                3 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: