-
Bug
-
Resolution: Won't Fix
-
Major
-
None
Since Jenkins core 1.612 Java 7 is required for core and agents. It may happen that in a migration a user forget to upgrade the JVM of an agent. It is not a supported but what is annoying is that it produces a important consumption of memory because the connection fails repeatedly with a Java Compatibility error which isn't correctly catched. The problem was originally
Here is the analyse done by stephenconnolly :
I suspect that the J6 may be causing other leaks as it is probably blowing up in unexpected places
May 31, 2016 2:21:13 PM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted connection #64 from /127.0.0.1:54507 May 31, 2016 2:21:13 PM hudson.TcpSlaveAgentListener$ConnectionHandler run WARNING: Connection #64 failed java.io.IOException: Remote call on jnlp failed at hudson.remoting.Channel.call(Channel.java:789) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:508) at jenkins.slaves.JnlpSlaveAgentProtocol$Handler.jnlpConnect(JnlpSlaveAgentProtocol.java:126) at jenkins.slaves.DefaultJnlpSlaveReceiver.handle(DefaultJnlpSlaveReceiver.java:70) at jenkins.slaves.JnlpSlaveAgentProtocol2$Handler2.run(JnlpSlaveAgentProtocol2.java:57) at jenkins.slaves.JnlpSlaveAgentProtocol2.handle(JnlpSlaveAgentProtocol2.java:30) at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:156) Caused by: java.lang.ClassFormatError: Failed to load hudson.slaves.SlaveComputer$SlaveVersion at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:340) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:249) at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1591) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349) at hudson.remoting.UserRequest.deserialize(UserRequest.java:184) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at hudson.remoting.Engine$1$1.run(Engine.java:62) at java.lang.Thread.run(Thread.java:695) at ......remote call to jnlp(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416) at hudson.remoting.UserResponse.retrieve(UserRequest.java:220) at hudson.remoting.Channel.call(Channel.java:781) ... 6 more Caused by: java.lang.UnsupportedClassVersionError: hudson/slaves/SlaveComputer$SlaveVersion : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:471) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:338) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:249) at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1591) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349) at hudson.remoting.UserRequest.deserialize(UserRequest.java:184) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at hudson.remoting.Engine$1$1.run(Engine.java:62) at java.lang.Thread.run(Thread.java:695)I think this is an issue in Jenkins Core, to whit:
All of this should be in a try ... catch block and we should probably close the channel if any of that fails.
Instead what is happening is that the channel remains semi-half-open:
- The slave side thinks it is closed but the Jenkins side does not.
- Because we have not set the slave's channel field, subsequent connection attempts will not be rejected due to an existing connection. In fact nothing is really retaining a reference to the channel, and we never got to set up the ping thread, so at best we are awaiting the OS to decide the socket is dead.
Using a `while true ; do java -jar slave.jar -noReconnect -jnlpUrl ... ; do` loop you can trigger the issue faster:
The memory will be reclaimed once the connection is old enough to have been deemed dead by the TCP stack, but I had one slave with at most one partially set-up connection and the Channel instances just keep on growing. Every so often you can get a few connections to drop off through a full GC, but there would still be loads still "live"
after a short while
after some more time
(next I let it run a little more then stopped the slave and triggered a full GC)
notice GC doesn't make much of a dent
1m40s later we were able to get GC to collect another instance, leaving loads still hanging around:
after another forced GC
The workaround is obviously not to have a J6 slave.