Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47953

Jobs stuck in queue "Jenkins doesn't have label ..."

    Details

    • Similar Issues:

      Description

      After updating to 2.73.3 jobs now gets randomly stuck in queue and Jenkins says that it doesn't have label .. I can see that some slave nodes (containers) coming up online for a split second then disappears, but then the job(s) gets stuck forever in the queue. The problem is that I do not see anything in the Jenkins logs that's out of the ordinary.

       

      Downgrading to 2.73.2 and recreating the config.xml (global config file) seems to fix the issue for us.

       

      P.S.: What's even more weird is that some jobs run while others gets stuck forever (sometimes).

        Attachments

          Activity

          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Nicolas De Loof
          Path:
          src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerCloudRetentionStrategy/config.groovy
          src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerCloudRetentionStrategy/help-idleMinutes.html
          src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerOnceRetentionStrategy/config.groovy
          http://jenkins-ci.org/commit/docker-plugin/2d0dda5a20401c34ad25dbf8df2b2948365b4f8e
          Log:
          use default timeout of 10 minutes to avoid JENKINS-47953

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Nicolas De Loof Path: src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerCloudRetentionStrategy/config.groovy src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerCloudRetentionStrategy/help-idleMinutes.html src/main/resources/com/nirima/jenkins/plugins/docker/strategy/DockerOnceRetentionStrategy/config.groovy http://jenkins-ci.org/commit/docker-plugin/2d0dda5a20401c34ad25dbf8df2b2948365b4f8e Log: use default timeout of 10 minutes to avoid JENKINS-47953
          Hide
          ndeloof Nicolas De Loof added a comment -

          ok, so it seems the idle timeout to default to 0 minute just kills your agent before it get assigned to run your job.

          switching this issue to Minor as this is more a UI/UX issue.

          Show
          ndeloof Nicolas De Loof added a comment - ok, so it seems the idle timeout to default to 0 minute just kills your agent before it get assigned to run your job. switching this issue to Minor as this is more a UI/UX issue.
          Hide
          akom Alexander Komarov added a comment -

          Update: both of my Jenkins masters are now correctly provisioning Docker slaves, for no obvious reason.  Here is what I did:

          1. "Idle Timeout=1" for each Docker Template under Experimental Options (instead of the default value 0)
          2. Restarted the master.

          I don't know if step 1 was really necessary.  I saved the main configuration (Apply) several times in the meantime.

          On a side note, lots of these in the logs after each run even though jobs succeed (I'm using JNLP):

           

          Nov 20, 2017 5:46:22 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          WARNING: NioChannelHub keys=133 gen=41087: Computer.threadPoolForRemoting [#1] for xx-docker-swarm-01-760c11ed terminated
          java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@2fabfc23[name=Channel to /xxx.xxx.xxx.xxx]
           at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:216)
           at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:646)
           at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:748)
          Caused by: java.io.IOException: Connection reset by peer
           at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
           at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
           at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
           at sun.nio.ch.IOUtil.read(IOUtil.java:197)
           at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
           at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:142)
           at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:359)
           at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:570)
           ... 6 more
           
          

           

          Show
          akom Alexander Komarov added a comment - Update: both of my Jenkins masters are now correctly provisioning Docker slaves, for no obvious reason.  Here is what I did: "Idle Timeout=1" for each Docker Template under Experimental Options (instead of the default value 0) Restarted the master. I don't know if step 1 was really necessary.  I saved the main configuration (Apply) several times in the meantime. On a side note, lots of these in the logs after each run even though jobs succeed (I'm using JNLP):   Nov 20, 2017 5:46:22 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed WARNING: NioChannelHub keys=133 gen=41087: Computer.threadPoolForRemoting [#1] for xx-docker-swarm-01-760c11ed terminated java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@2fabfc23[name=Channel to /xxx.xxx.xxx.xxx] at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:216) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:646) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:142) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:359) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:570) ... 6 more    
          Hide
          akom Alexander Komarov added a comment -

          I have two Jenkins masters that I upgraded simultaneously, one works, one doesn't.   Both are running Jenkins 2.90 and the latest plugins (1.0.4/1.9, and in fact all other plugins are on latest as of today), and both are using the same Docker cloud with more or less the same config.xml

          The one that doesn't work does not log anything docker/cloud/provisioning related at all (in /log/all), as if it isn't happening.

          Show
          akom Alexander Komarov added a comment - I have two Jenkins masters that I upgraded simultaneously, one works, one doesn't.   Both are running Jenkins 2.90 and the latest plugins ( 1.0.4/1.9 , and in fact all other plugins are on latest as of today), and both are using the same Docker cloud with more or less the same config.xml The one that doesn't work does not log anything docker/cloud/provisioning related at all (in /log/all), as if it isn't happening.
          Hide
          ffarah Fadi Farah added a comment - - edited

          We've updated to 1.0.4 (from 0.16.2) and updated docker-commons to 1.9 (from 1.8) when we were on 2.73.2 and we had no problems there.

          Btw, here are the detailed steps we've taken to get things working again:

          • Downgraded to 2.73.2 (we were still experiencing this issue).
          • Then, we downgraded docker-plugin to 0.16.2 and docker-commons to 1.8 (we were still experiencing this issue).
          • Finally we recreated the config.xml file (then everything started working normally again).
          Show
          ffarah Fadi Farah added a comment - - edited We've updated to 1.0.4 (from 0.16.2) and updated docker-commons to 1.9 (from 1.8) when we were on 2.73.2 and we had no problems there. Btw, here are the detailed steps we've taken to get things working again: Downgraded to 2.73.2 (we were still experiencing this issue). Then, we downgraded docker-plugin to 0.16.2 and docker-commons to 1.8 (we were still experiencing this issue). Finally we recreated the config.xml file (then everything started working normally again).
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          By the way, any chance you updated Docker Plugin to 1.0 during the upgrade to 2.73.3?

          Show
          oleg_nenashev Oleg Nenashev added a comment - By the way, any chance you updated Docker Plugin to 1.0 during the upgrade to 2.73.3?
          Hide
          ffarah Fadi Farah added a comment - - edited

          I'll try to upgrade again and see if it happens again, but this is definitely isn't a one off issue; Here's why:

          We have 9 Jenkins masters total (spread out in different regions, some are even in Frankfurt and China). 5 of those Jenkins are installed through APT, and 4 of them are just jars ran by Tomcat.

          Some are installed on Ubuntu 16.04 and some are installed on 14.04. So there is a good variety between all of those masters.

          Show
          ffarah Fadi Farah added a comment - - edited I'll try to upgrade again and see if it happens again, but this is definitely isn't a one off issue; Here's why: We have 9 Jenkins masters total (spread out in different regions, some are even in Frankfurt and China). 5 of those Jenkins are installed through APT, and 4 of them are just jars ran by Tomcat. Some are installed on Ubuntu 16.04 and some are installed on 14.04. So there is a good variety between all of those masters.
          Hide
          danielbeck Daniel Beck added a comment -

          Other than the channel pinger, there seems to be nothing in 2.73.3 that would explain this.

          Could you try upgrading again to see whether the problem reoccurs after you've reset the configuration and downgraded, or whether it was a one off issue?

          Do you still have the logs from the 2.73.3 run that could indicate a specific problem by logging error messages?

          CC Oleg Nenashev

          Show
          danielbeck Daniel Beck added a comment - Other than the channel pinger, there seems to be nothing in 2.73.3 that would explain this. Could you try upgrading again to see whether the problem reoccurs after you've reset the configuration and downgraded, or whether it was a one off issue? Do you still have the logs from the 2.73.3 run that could indicate a specific problem by logging error messages? CC Oleg Nenashev

            People

            • Assignee:
              ndeloof Nicolas De Loof
              Reporter:
              ffarah Fadi Farah
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: