Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25867

Gearman won't schedule new jobs even though there are slots available on master

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We have a setup with one Jenkins master and Zuul triggers the job through the Jenkins Gearman plugin.

      Sometimes no new jobs will be scheduled even though all slots are available.

      A workaround for the slaves is to disconnect/connect the slave and then new job would be scheduled again.
      For the master the only way to get jobs to be scheduled again is to restart the Jenkins service.

      When this happens on one node jobs would still be scheduled on other nodes.

      Attaching server thread log for gearman threads when no jobs are currently running and jobs are scheduled in the queue.
      Also attaching a trunkated jenkins.log by (grep -C 2 10.33.14.26_manager)

      Let me know if you need more logs or other info, I would be happy to help

        Attachments

          Issue Links

            Activity

            Hide
            zaro Khai Do added a comment -

            @Christian, any updates on your end on this issue? Antoine reported that the related issue has resolved.

            Show
            zaro Khai Do added a comment - @Christian, any updates on your end on this issue? Antoine reported that the related issue has resolved.
            Hide
            zaro Khai Do added a comment -

            I believe this is fixed in version 0.1.2

            Show
            zaro Khai Do added a comment - I believe this is fixed in version 0.1.2
            Hide
            hashar Antoine Musso added a comment - - edited

            It happened again with the the gearman-plugin v0.1.2 (

            Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
            AvailabilityMonitor canTake request for null
            Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
            AvailabilityMonitor canTake request for null
            Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake
            AvailabilityMonitor canTake request for null
            

            With jobs tied to that instance being stuck waiting for an available executor on deployment-bastion.

            Marking the node offline and online doesn't remove the lock :-/

            The executor threads have:

            "Gearman worker deployment-bastion.eqiad_exec-1" prio=5 WAITING
            	java.lang.Object.wait(Native Method)
            	java.lang.Object.wait(Object.java:503)
            	hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
            	hudson.plugins.gearman.StartJobWorker.safeExecuteFunction(StartJobWorker.java:196)
            	hudson.plugins.gearman.StartJobWorker.executeFunction(StartJobWorker.java:114)
            	org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:125)
            	org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:22)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.submitFunction(MyGearmanWorkerImpl.java:593)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:328)
            	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
            	java.lang.Thread.run(Thread.java:745)
            
            "Gearman worker deployment-bastion.eqiad_exec-2" prio=5 TIMED_WAITING
            	java.lang.Object.wait(Native Method)
            	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
            	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
            	java.lang.Thread.run(Thread.java:745)
            
            "Gearman worker deployment-bastion.eqiad_exec-3" prio=5 TIMED_WAITING
            	java.lang.Object.wait(Native Method)
            	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
            	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
            	java.lang.Thread.run(Thread.java:745)
            
            "Gearman worker deployment-bastion.eqiad_exec-4" prio=5 TIMED_WAITING
            	java.lang.Object.wait(Native Method)
            	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
            	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
            	java.lang.Thread.run(Thread.java:745)
            
            "Gearman worker deployment-bastion.eqiad_exec-5" prio=5 TIMED_WAITING
            	java.lang.Object.wait(Native Method)
            	hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421)
            	hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320)
            	hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166)
            	java.lang.Thread.run(Thread.java:745)
            
            Show
            hashar Antoine Musso added a comment - - edited It happened again with the the gearman-plugin v0.1.2 ( Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake AvailabilityMonitor canTake request for null Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake AvailabilityMonitor canTake request for null Jul 28, 2015 10:30:35 AM FINE hudson.plugins.gearman.NodeAvailabilityMonitor canTake AvailabilityMonitor canTake request for null With jobs tied to that instance being stuck waiting for an available executor on deployment-bastion. Marking the node offline and online doesn't remove the lock :-/ The executor threads have: "Gearman worker deployment-bastion.eqiad_exec-1" prio=5 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:503) hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73) hudson.plugins.gearman.StartJobWorker.safeExecuteFunction(StartJobWorker.java:196) hudson.plugins.gearman.StartJobWorker.executeFunction(StartJobWorker.java:114) org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:125) org.gearman.worker.AbstractGearmanFunction.call(AbstractGearmanFunction.java:22) hudson.plugins.gearman.MyGearmanWorkerImpl.submitFunction(MyGearmanWorkerImpl.java:593) hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:328) hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166) java.lang.Thread.run(Thread.java:745) "Gearman worker deployment-bastion.eqiad_exec-2" prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83) hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380) hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421) hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320) hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166) java.lang.Thread.run(Thread.java:745) "Gearman worker deployment-bastion.eqiad_exec-3" prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83) hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380) hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421) hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320) hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166) java.lang.Thread.run(Thread.java:745) "Gearman worker deployment-bastion.eqiad_exec-4" prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83) hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380) hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421) hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320) hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166) java.lang.Thread.run(Thread.java:745) "Gearman worker deployment-bastion.eqiad_exec-5" prio=5 TIMED_WAITING java.lang.Object.wait(Native Method) hudson.plugins.gearman.NodeAvailabilityMonitor.lock(NodeAvailabilityMonitor.java:83) hudson.plugins.gearman.MyGearmanWorkerImpl.sendGrabJob(MyGearmanWorkerImpl.java:380) hudson.plugins.gearman.MyGearmanWorkerImpl.processSessionEvent(MyGearmanWorkerImpl.java:421) hudson.plugins.gearman.MyGearmanWorkerImpl.work(MyGearmanWorkerImpl.java:320) hudson.plugins.gearman.AbstractWorkerThread.run(AbstractWorkerThread.java:166) java.lang.Thread.run(Thread.java:745)
            Hide
            hashar Antoine Musso added a comment -

            The node is named deployment-bastion-eqiad, with a label deployment-bastion-eqiad. Jobs are tied to deployment-bastion-eqiad.

            The workaround I found was to remove the label from the node. Once done, the jobs shows in the queue with 'no node having label deployment-bastion-eqiad'.

            I then applied the label again on the host and the job managed to run.

            So maybe it is an issue in Jenkins itself :-}

            Show
            hashar Antoine Musso added a comment - The node is named deployment-bastion-eqiad, with a label deployment-bastion-eqiad. Jobs are tied to deployment-bastion-eqiad. The workaround I found was to remove the label from the node. Once done, the jobs shows in the queue with 'no node having label deployment-bastion-eqiad'. I then applied the label again on the host and the job managed to run. So maybe it is an issue in Jenkins itself :-}
            Hide
            hashar Antoine Musso added a comment -

            The deadlock still happens from time to time with Jenkins 1.625.3 LTS and Gearman plugin 1.3.3 with https://review.openstack.org/#/c/252768/

            Show
            hashar Antoine Musso added a comment - The deadlock still happens from time to time with Jenkins 1.625.3 LTS and Gearman plugin 1.3.3 with https://review.openstack.org/#/c/252768/

              People

              • Assignee:
                zaro Khai Do
                Reporter:
                ki82 Christian Bremer
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: