Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Component/s: core, windows-slave-installer-module
Labels:
None
Environment:
Jenkins 2.19.3

Similar Issues:

Show

We have an issue where windows slaves fall off line every time our infrastructure team patches them. The scenario is simply this.

The machines get patched with the lastest windows patches.
This triggers a reboot.

The slave service shuts down with a log entry in the jenkins-slave.wrapper log to the effect of:

2017-03-27 07:50:19 - Shutdown exception
Message:A system shutdown is in progress. (Exception from HRESULT: 0x8007045B)
Stacktrace:   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
   at System.Management.ManagementScope.InitializeGuts(Object o)
   at System.Management.ManagementScope.Initialize()
   at System.Management.ManagementObjectSearcher.Initialize()
   at System.Management.ManagementObjectSearcher.Get()
   at winsw.WrapperService.GetChildPids(Int32 pid)
   at winsw.WrapperService.StopProcessAndChildren(Int32 pid)
   at winsw.WrapperService.StopIt()
   at winsw.WrapperService.OnShutdown()

(4) The slave restarts and we see this in the jenkins-slave_<date>.err log:

Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: sv20-jenddb-001
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [https://jenkins.core.cvent.org/]
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.core.cvent.org:55087
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server reports protocol JNLP3-connect not supported, skipping
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP2-connect
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server didn't accept the handshake: sv20-jenddb-001 is already connected to this master. Rejecting this connection.
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.core.cvent.org:55087
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP-connect
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server didn't accept the handshake: sv20-jenddb-001 is already connected to this master. Rejecting this connection.
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.core.cvent.org:55087
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: The server rejected the connection: None of the protocols were accepted
java.lang.Exception: The server rejected the connection: None of the protocols were accepted
	at hudson.remoting.Engine.onConnectionRejected(Engine.java:380)
	at hudson.remoting.Engine.run(Engine.java:352)

We then go in and restart the slave service manually and everything is fine.

What seems to be happening is that when the slave service shuts down due to a system shutdown request, it fails to notify the master that it is shutting down. As a result, when it starts back up after the reboot, the master still thinks it is connected and refuses to allow it to connect. By the time we get in there to manually restart the service, the master realized the slave is off line, so the service restart/reconnection works fine at that point.

It seems there are two possible solutions here:

The slave should notify the master that it is shutting down so that the master will not still think it is 'online'.
The master, when it receives a connection request for a slave that it thinks is 'online' should verify that the old connection is really still active before refusing to accept the new one.

Or do both?

Note we are able to reproduce this simply by rebooting a windows slave. It always fails to reconnect as described.

duplicates

JENKINS-22692 Jenkins Windows-Slave throwing exception on shutdown causes connection reset issues

Resolved

Assignee:: Oleg Nenashev

Reporter:: Kenneth Baltrinic

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2017-03-27 12:24

Updated:: 2017-03-27 12:50

Resolved:: 2017-03-27 12:50

Details

Description

Attachments

Issue Links

Activity

People

Dates