Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50730

NoClassDefFound errors in Cloud Slaves

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      Remoting 3.28

      Description

      While provisioning slaves from a private Kubernetes instance, we've found that a lot of slaves terminate with the following (or similar) stack trace on the slave's side:

       

      INFO: Setting up slave: kube1-medium-r9zf4
      Apr 10, 2018 11:02:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Apr 10, 2018 11:02:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/<user>/workDir/remoting as a remoting work directory
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server ...
      Apr 10, 2018 11:02:06 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful <...>
      pr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to <Jenkins Master>
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: <...>
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Apr 10, 2018 11:02:14 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Apr 10, 2018 11:02:14 AM hudson.remoting.UserRequest perform
      WARNING: LinkageError while performing UserRequest:jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2@3e708317
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller$2$1
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:71)
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:53)
              at hudson.remoting.UserRequest.perform(UserRequest.java:207)
              at hudson.remoting.UserRequest.perform(UserRequest.java:53)
              at hudson.remoting.Request$2.run(Request.java:358)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at hudson.remoting.Engine$1$1.run(Engine.java:98)
              at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1
              at java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              ... 11 more
      
      

      The class that appears to not have been found isn't consistently the same. I've seen `FilePathFilter`, `LaunchConfiguration`, `StringBuilderWriter`, and some others being reported as well. Sometimes, there's also exceptions related to `JarCacheSupport` not being able to resolve jars (I don't have the exact stacktrace at hand - will post it if I find it again).

      On the master's side, these exceptions generally manifest as `ChannelClosedException`s, or weird Exception-less failures in pipeline branches.

      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      remote file operation failed: /home/<user>/workspace/<job_name> at hudson.remoting.Channel@6639429c:JNLP4-connect connection from <some-host-name>/<ip-address>:60326: hudson.remoting.ChannelClosedException: Remote call on JNLP4-connect connection from <some-host-name>/<ip-address>:60326 failed. The channel is closing down or has closed down
      

      I haven't been able to consistently reproduce the error, but it does manifest enough to be causing major pain to users (especially since we extensively use pipelines with a large number of parallel nodes, and a failure in any one of the nodes causes the entire pipeline to fail).

        Attachments

          Issue Links

            Activity

            Hide
            tcoan Thomas COAN added a comment -

            Thanks Jeff,

            Yes changing the log message would be great. 

             

            => As a workaround, I have changes the way of connecting the slave using now the ssh agent method instead of java web start.

            There are no connection failure from slave to master anymore for a couple of days

            Thomas 

            Show
            tcoan Thomas COAN added a comment - Thanks Jeff, Yes changing the log message would be great.    => As a workaround, I have changes the way of connecting the slave using now the ssh agent  method instead of java web start. There are no connection failure from slave to master anymore for a couple of days Thomas 
            Hide
            jthompson Jeff Thompson added a comment -

            Thomas COAN, I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295   If you have any comments on that proposal, please share.

            That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.

            Show
            jthompson Jeff Thompson added a comment - Thomas COAN , I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295    If you have any comments on that proposal, please share. That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.
            Hide
            tcoan Thomas COAN added a comment -

            Jeff Thompson, I have reviewed the PR, it seems good to me. Thanks Jeff.

            As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue. 

            With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.

             

            Show
            tcoan Thomas COAN added a comment - Jeff Thompson , I have reviewed the PR, it seems good to me. Thanks Jeff. As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue.  With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.  
            Hide
            jthompson Jeff Thompson added a comment -

            Thomas COAN, if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother. 

            I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.

            Show
            jthompson Jeff Thompson added a comment - Thomas COAN , if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother.  I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.
            Hide
            jthompson Jeff Thompson added a comment -

            As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations.

            This will be picked up by a Jenkins weekly build soon.

            Show
            jthompson Jeff Thompson added a comment - As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations. This will be picked up by a Jenkins weekly build soon.

              People

              • Assignee:
                jthompson Jeff Thompson
                Reporter:
                karthikduddu Karthik Duddu
              • Votes:
                3 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: