For those of us that do want to use JNLP rather than Attach mode, here is a quick and dirty workaround that has proven to stabilize docker launching under heavy load (in my case, a matrix job spawning 70 containers simultaneously). All it does is re-run the JNLP script a few times in case the master wasn't ready. Without this, I was left with a ton of stopped containers and no resources.
I simply change the ENTRYPOINT in my images from the default (/usr/local/bin/jenkins-agent script from jenkins/jnlp-slave) to this script:
# sleep between retries, if needed (s)
# Try to reconnect this many times
# Stop retrying after this many seconds regardless
# Do not retry if we're running bash to debug in this container
# more than 1 arg is probably jenkins jnlp start
if [ $# -eq 1 ] ; then
while [ $TRIES -gt 0 ] && [ $(($(date +%s) - $START)) -lt $MAXTIME ] && ! $ACTUAL_ENTRYPOINT "$@" ; do
echo "$ACTUAL_ENTRYPOINT exited [$CODE], waiting $SLEEP seconds and retrying"
TRIES=$(($TRIES - 1))
echo "Exiting [$CODE] with $TRIES remaining tries and $(($(date +%s) - $START)) seconds elapsed"
and my Dockerfile looks like this:
FROM jenkins/jnlp-slave # directly or indirectly
CMD [ "/bin/bash" ]
ENTRYPOINT [ "entrypoint" ]
(My images have bash... for alpine you may want to change the shebang to /bin/sh)
For the record, network stability has been an issue for me with Attach. Since I'm using classic Swarm, the topology is too complex and the connection is sometimes lost:
Master -> Swarm Manager -> Docker Host -> Container
With JNLP, it's simply:
Oleg Nenashev, perhaps it may be worthwhile to integrate some (better than the above) retry logic in jenkins-agent script or even slave.jar itself? (possibly after identifying the "unknown name" error from the master). Happy to make a PR with some guidance.