We're also seeing this, and I feel like the ticket priority should be bumped up until a workaround is presented. It's bad enough that we're rewriting the puppet-jenkins module to support SSH slaves instead of using this plugin, because the unreliability is causing regular job failures. With 4 slave workers, we were experiencing 1-2 going down per day. Swapped off the swarm plugin and haven't experienced a single node go down in ~ a week.
It's also unclear if newer versions do or don't have this problem, but it's hard to update to 3.3 with so much of the changelog seemingly missing. Does 3.x's changelog combine all the changes of the prior failed releases?