Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45648

Agents keep disconnecting after some time

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: swarm-plugin
    • Labels:
    • Environment:
      ## Master
      Jenkins Version : 2.65
      OS: Alpine OS 3.5
      Swarm Plugin 3.4

      ## Slave
      Jenkins Swarm Client: 3.3
      OS: CentOS 7
      Java: 1.8
    • Similar Issues:

      Description

      Jenkins Swarm successfully connect to master and take jobs but keep on going offline and then reconnecting after few seconds

       

      This is how I connect to Jenkins Master

      java -jar /opt/swarm-client.jar \
      -master http://${master_elb} \
      -username ${user} \
      -password ${password} \
      -labels slave \
      -executors ${executors} \
      -description 'Jenkins Slave' \
      -retryInterval 5 \
      -fsroot ${app_dir} \
      -name ${LOCAL_IP}

       

      These are the logs

      INFO: Agent discovery successful
      Agent address: my-jenkins-master.com
      Agent port: 50000
      Identity: 05:d2:dc:14:94:0e:3a:e9:18:7b:2b:dc:2b:e0:06:4c
      Jul 19, 2017 6:40:36 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 19, 2017 6:40:36 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to my-jenkins-master.com:50000
      Jul 19, 2017 6:40:36 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Jul 19, 2017 6:40:36 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: 05:d2:dc:14:94:0e:3a:e9:18:7b:2b:dc:2b:e0:06:4c
      Jul 19, 2017 6:40:36 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Jul 19, 2017 6:41:41 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Jul 19, 2017 6:41:41 PM hudson.plugins.swarm.Client run
      INFO: Retrying in 5 seconds
      Jul 19, 2017 6:41:46 PM hudson.plugins.swarm.Client run
      INFO: Attempting to connect to http://my-jenkins-master.com/ ae3d1dbd-4a93-4c89-b98c-f90ed62f1f71 with ID 7aef328a
      Jul 19, 2017 6:41:46 PM hudson.plugins.swarm.SwarmClient getCsrfCrumb
      SEVERE: Could not obtain CSRF crumb. Response code: 404
      Jul 19, 2017 6:41:47 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: 10-61-67-169-7aef328a
      Jul 19, 2017 6:41:47 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 19, 2017 6:41:47 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://my-jenkins-master.com/]
      Jul 19, 2017 6:41:47 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful
      Agent address: my-jenkins-master.com
      Agent port: 50000
      Identity: 05:d2:dc:14:94:0e:3a:e9:18:7b:2b:dc:2b:e0:06:4c

      Is there something else I can do to make it stable?

       

        Attachments

          Activity

          Hide
          zindello Josh Mesilane added a comment -

          I'm also seeing this error on 3.4 Traces are below:

          Jul 20, 2017 3:01:14 AM hudson.plugins.swarm.Client run
          INFO: Discovering Jenkins master
          SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
          SLF4J: Defaulting to no-operation (NOP) logger implementation
          SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
          Jul 20, 2017 3:01:14 AM hudson.plugins.swarm.Client run
          INFO: Attempting to connect to http://jenkins.us-west-2.utils.aws.geniussports.com/ b50070f6-196f-4ea9-b881-18817e73e750 with ID
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: i-0d8cc16d1963fc8f9
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among http://jenkins.us-west-2.utils.aws.geniussports.com/
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Agent discovery successful
            Agent address: 172.16.0.206
            Agent port:    5000
            Identity:      e8:9d:53:db:54:0d:59:9a:d2:8e:e4:32:5d:10:c1:f8
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to 172.16.0.206:5000
          Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Trying protocol: JNLP4-connect
          Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Remote identity confirmed: e8:9d:53:db:54:0d:59:9a:d2:8e:e4:32:5d:10:c1:f8
          Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jul 20, 2017 3:01:15 AM hudson.plugins.swarm.Client run
          WARNING: Connection closed, exiting...

          And on the master i'm seeing:

          Jul 20, 2017 3:01:15 AM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run
          Accepted JNLP4-connect connection #4 from 172.16.2.207/172.16.2.207:49842
          Jul 20, 2017 3:01:15 AM WARNING jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          Computer.threadPoolForRemoting 8 for i-0d8cc16d1963fc8f9 terminated java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208) at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800) at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173) at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311) at hudson.remoting.Channel.close(Channel.java:1295) at hudson.remoting.Channel.close(Channel.java:1263) at jenkins.slaves.DefaultJnlpSlaveReceiver.afterChannel(DefaultJnlpSlaveReceiver.java:173) at org.jenkinsci.remoting.engine.JnlpConnectionState$4.invoke(JnlpConnectionState.java:421) at org.jenkinsci.remoting.engine.JnlpConnectionState.fire(JnlpConnectionState.java:312) at org.jenkinsci.remoting.engine.JnlpConnectionState.fireAfterChannel(JnlpConnectionState.java:418) at org.jenkinsci.remoting.engine.JnlpProtocol4Handler$Handler$1.run(JnlpProtocol4Handler.java:334) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

          Show
          zindello Josh Mesilane added a comment - I'm also seeing this error on 3.4 Traces are below: Jul 20, 2017 3:01:14 AM hudson.plugins.swarm.Client run INFO: Discovering Jenkins master SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Jul 20, 2017 3:01:14 AM hudson.plugins.swarm.Client run INFO: Attempting to connect to http://jenkins.us-west-2.utils.aws.geniussports.com/ b50070f6-196f-4ea9-b881-18817e73e750 with ID Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: i-0d8cc16d1963fc8f9 Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among http://jenkins.us-west-2.utils.aws.geniussports.com/ Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful   Agent address: 172.16.0.206   Agent port:    5000   Identity:      e8:9d:53:db:54:0d:59:9a:d2:8e:e4:32:5d:10:c1:f8 Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.16.0.206:5000 Jul 20, 2017 3:01:14 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: e8:9d:53:db:54:0d:59:9a:d2:8e:e4:32:5d:10:c1:f8 Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Jul 20, 2017 3:01:15 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jul 20, 2017 3:01:15 AM hudson.plugins.swarm.Client run WARNING: Connection closed, exiting... And on the master i'm seeing: Jul 20, 2017 3:01:15 AM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run Accepted JNLP4-connect connection #4 from 172.16.2.207/172.16.2.207:49842 Jul 20, 2017 3:01:15 AM WARNING jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed Computer.threadPoolForRemoting 8 for i-0d8cc16d1963fc8f9 terminated java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208) at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:800) at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173) at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:311) at hudson.remoting.Channel.close(Channel.java:1295) at hudson.remoting.Channel.close(Channel.java:1263) at jenkins.slaves.DefaultJnlpSlaveReceiver.afterChannel(DefaultJnlpSlaveReceiver.java:173) at org.jenkinsci.remoting.engine.JnlpConnectionState$4.invoke(JnlpConnectionState.java:421) at org.jenkinsci.remoting.engine.JnlpConnectionState.fire(JnlpConnectionState.java:312) at org.jenkinsci.remoting.engine.JnlpConnectionState.fireAfterChannel(JnlpConnectionState.java:418) at org.jenkinsci.remoting.engine.JnlpProtocol4Handler$Handler$1.run(JnlpProtocol4Handler.java:334) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
          Hide
          jorgziegler Jörg Ziegler added a comment -

          Confirming with 2.60.2 - for me this only applies to slaves running on Linux, our Windows swarm slaves running as a service still run fine. The combination of swarm plugin 3.4 and jenkins 2.46.3 works fine. It fails with swarm plugin 3.4 and jenkins 2.60.2

           

          Show
          jorgziegler Jörg Ziegler added a comment - Confirming with 2.60.2 - for me this only applies to slaves running on Linux, our Windows swarm slaves running as a service still run fine. The combination of swarm plugin 3.4 and jenkins 2.46.3 works fine. It fails with swarm plugin 3.4 and jenkins 2.60.2  
          Hide
          vikas027 Vikas Kumar added a comment -

          I have moved to SSH slaves now, no issues so far.

          Show
          vikas027 Vikas Kumar added a comment - I have moved to SSH slaves now, no issues so far.
          Hide
          jj2007 Jayesh Jadhav added a comment - - edited

          for me i am getting a similar issue on  when trying to run swarm as a service on linux machine (sles12) , Jenkins 2.93 and swarm 3.3 (also swarm 3.6) . But when i run the command manually , it works OK .

          My swarm service file 

           

          test-swarm:/opt/swarm # cat /etc/systemd/system/swarm.service
           [Unit]
          Description=Swarm Client - Jenkins Slave
          [Service]
          Type=fork
          RemainAfterExit=yes
          ExecStart=/usr/bin/java -jar /opt/swarm/swarm-client-3.3.jar -master https://bs-jenkins.domain.com-name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm
          ExecStop=/bin/kill $MAINPID
          User=swarm
          EnvironmentFile=/opt/swarm/user.env
          TimeoutStopSec=10
          [Install]
          WantedBy=multi-user.target
          test-swarm:/opt/swarm #
          test-swarm:/opt/swarm # journalctl -lf
          Dec 13 09:45:17 test-swarm java[30083]: Dec 13, 2017 9:45:17 AM hudson.plugins.swarm.Client run
          Dec 13 09:45:17 test-swarm java[30083]: INFO: Retrying in 10 seconds
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.plugins.swarm.Client run
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Attempting to connect to https://bs-jenkins.domain.com/ 5f2c3f2c-1317-447f-843d-1b041c97f9ef with ID
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb
          Dec 13 09:45:27 test-swarm java[30083]: SEVERE: Could not obtain CSRF crumb. Response code: 404
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main createEngine
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Setting up slave: test-swarm.domain.com
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener <init>
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Jenkins agent is running in headless mode.
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Locating server among https://bs-jenkins.domain.com
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Agent discovery successful
          Dec 13 09:45:27 test-swarm java[30083]: Agent address: bs-jenkins.domain.com
          Dec 13 09:45:27 test-swarm java[30083]: Agent port: 40677
          Dec 13 09:45:27 test-swarm java[30083]: Identity: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Handshaking
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Connecting to bs-jenkins.domain.com:40677
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Trying protocol: JNLP4-connect
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Remote identity confirmed: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Connected
          Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status
          Dec 13 09:45:27 test-swarm java[30083]: INFO: Terminated
          

           

           

          whereas when i run it manually it works fine , and master is able to connect to the slave .

          test-swarm:/opt/swarm # /usr/bin/java -jar /opt/swarm/swarm-client-3.3.jar -master https://bs-jenkins.domain.com -name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm
          Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client main
          INFO: Client.main invoked with: [-master https://bs-jenkins.domain.com -name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm]
          Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client run
          INFO: Discovering Jenkins master
          SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
          SLF4J: Defaulting to no-operation (NOP) logger implementation
          SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
          Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client run
          INFO: Attempting to connect to https://bs-jenkins.domain.com 5f2c3f2c-1317-447f-843d-1b041c97f9ef with ID
          Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb
          SEVERE: Could not obtain CSRF crumb. Response code: 404
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: test-swarm.domain.com
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among https://bs-jenkins.domain.com/
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Agent discovery successful
          Agent address: bs-jenkins.domain.com
          Agent port: 40677
          Identity: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to bs-jenkins.domain.com:40677
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Trying protocol: JNLP4-connect
          Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Remote identity confirmed: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49
          Dec 13, 2017 11:05:24 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          Show
          jj2007 Jayesh Jadhav added a comment - - edited for me i am getting a similar issue on  when trying to run swarm as a service on linux machine (sles12) , Jenkins 2.93 and swarm 3.3 (also swarm 3.6) . But when i run the command manually , it works OK . My swarm service file    test-swarm:/opt/swarm # cat /etc/systemd/system/swarm.service [Unit] Description=Swarm Client - Jenkins Slave [Service] Type=fork RemainAfterExit=yes ExecStart=/usr/bin/java -jar /opt/swarm/swarm-client-3.3.jar -master https: //bs-jenkins.domain.com-name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm ExecStop=/bin/kill $MAINPID User=swarm EnvironmentFile=/opt/swarm/user.env TimeoutStopSec=10 [Install] WantedBy=multi-user.target test-swarm:/opt/swarm # test-swarm:/opt/swarm # journalctl -lf Dec 13 09:45:17 test-swarm java[30083]: Dec 13, 2017 9:45:17 AM hudson.plugins.swarm.Client run Dec 13 09:45:17 test-swarm java[30083]: INFO: Retrying in 10 seconds Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.plugins.swarm.Client run Dec 13 09:45:27 test-swarm java[30083]: INFO: Attempting to connect to https: //bs-jenkins.domain.com/ 5f2c3f2c-1317-447f-843d-1b041c97f9ef with ID Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb Dec 13 09:45:27 test-swarm java[30083]: SEVERE: Could not obtain CSRF crumb. Response code: 404 Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main createEngine Dec 13 09:45:27 test-swarm java[30083]: INFO: Setting up slave: test-swarm.domain.com Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener <init> Dec 13 09:45:27 test-swarm java[30083]: INFO: Jenkins agent is running in headless mode. Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Locating server among https: //bs-jenkins.domain.com Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Agent discovery successful Dec 13 09:45:27 test-swarm java[30083]: Agent address: bs-jenkins.domain.com Dec 13 09:45:27 test-swarm java[30083]: Agent port: 40677 Dec 13 09:45:27 test-swarm java[30083]: Identity: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49 Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Handshaking Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Connecting to bs-jenkins.domain.com:40677 Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Trying protocol: JNLP4-connect Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Remote identity confirmed: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49 Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Connected Dec 13 09:45:27 test-swarm java[30083]: Dec 13, 2017 9:45:27 AM hudson.remoting.jnlp.Main$CuiListener status Dec 13 09:45:27 test-swarm java[30083]: INFO: Terminated     whereas when i run it manually it works fine , and master is able to connect to the slave . test-swarm:/opt/swarm # /usr/bin/java -jar /opt/swarm/swarm-client-3.3.jar -master https: //bs-jenkins.domain.com -name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client main INFO: Client.main invoked with: [-master https: //bs-jenkins.domain.com -name test-swarm.domain.com -disableSslVerification -disableClientsUniqueId -executors 10 -fsroot /opt/swarm] Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client run INFO: Discovering Jenkins master SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" . SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http: //www.slf4j.org/codes.html#StaticLoggerBinder for further details. Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.Client run INFO: Attempting to connect to https: //bs-jenkins.domain.com 5f2c3f2c-1317-447f-843d-1b041c97f9ef with ID Dec 13, 2017 11:05:23 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb SEVERE: Could not obtain CSRF crumb. Response code: 404 Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: test-swarm.domain.com Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among https: //bs-jenkins.domain.com/ Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful Agent address: bs-jenkins.domain.com Agent port: 40677 Identity: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49 Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to bs-jenkins.domain.com:40677 Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Dec 13, 2017 11:05:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: ef:b2:6f:12:aa:96:a0:5f:d4:9c:9d:5a:9f:06:96:49 Dec 13, 2017 11:05:24 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Show
          oleg_nenashev Oleg Nenashev added a comment - KK does not maintain this plugin anymore. Moving to unassigned to set the expectation
          Hide
          ryan_white Ryan White added a comment -

          I recently had a similar problem after upgrading the Jenkins master from LTS 2.150.2 to LTS 2.89.4, and the swarm plugin from 3.10 (both master and agent) to 3.15.

          I can run the agent process in the foreground from a regular login shell and it would stay connected, but when the agent is run via the system init system, it disconnects and reconnects constantly. The original java process runs the whole time.

          The agent systems where I noticed the issue were CentOS 7. The init scripts are SystemV style configured by an old version of the jenkins Puppet module maintained by rtyler.

          The only thing that has consistently worked for me to keep the agent processes connected to the master is clearing out the agent workspace directory. When the agent it stopped, the workspace dir is renamed and the agent restarted, the agent stays connected as expected.

          Putting the old workspace directory back causes the agents to again disconnect and reconnect.

          Show
          ryan_white Ryan White added a comment - I recently had a similar problem after upgrading the Jenkins master from LTS 2.150.2 to LTS 2.89.4, and the swarm plugin from 3.10 (both master and agent) to 3.15. I can run the agent process in the foreground from a regular login shell and it would stay connected, but when the agent is run via the system init system, it disconnects and reconnects constantly. The original java process runs the whole time. The agent systems where I noticed the issue were CentOS 7. The init scripts are SystemV style configured by an old version of the jenkins Puppet module maintained by rtyler. The only thing that has consistently worked for me to keep the agent processes connected to the master is clearing out the agent workspace directory. When the agent it stopped, the workspace dir is renamed and the agent restarted, the agent stays connected as expected. Putting the old workspace directory back causes the agents to again disconnect and reconnect.

            People

            • Assignee:
              Unassigned
              Reporter:
              vikas027 Vikas Kumar
            • Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: