Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26020

Will not start builds even though there are available slots on executor

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Incomplete
    • Component/s: core
    • Labels:
      None
    • Environment:
      LTS 1.580.1
    • Similar Issues:

      Description

      Sometimes our nodes won't be able to start new builds even though there are free slots available.

      A workaround for the slaves is to disconnect/connect the slave and it will start to schedule builds again.

      I have observed that when this happens for a slave the slave has fewer threads ongoing than an idle slave.

      Attaching thread dumps when this happens and after doing an disconnect/connect.

      We have seen this issue both on Windows(jlnp) slaves and linux(ssh) slaves as well as on the master node which is running linux.

        Attachments

          Issue Links

            Activity

            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

            Show
            oleg_nenashev Oleg Nenashev added a comment - I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info
            Hide
            oleg_nenashev Oleg Nenashev added a comment - - edited

            Windows service issue should be fixed by JENKINS-39231. I do not see anything else we can diagnose here

            Show
            oleg_nenashev Oleg Nenashev added a comment - - edited Windows service issue should be fixed by JENKINS-39231 . I do not see anything else we can diagnose here
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.

            Show
            oleg_nenashev Oleg Nenashev added a comment - > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.
            Hide
            ki82 Christian Bremer added a comment -

            Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.

            Show
            ki82 Christian Bremer added a comment - Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.
            Hide
            ki82 Christian Bremer added a comment -

            We get ~5000 JnlpSlaveHandshake errors per hour:

            Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            We get these errors at all times, also when we can schedule on all slaves.

            Show
            ki82 Christian Bremer added a comment - We get ~5000 JnlpSlaveHandshake errors per hour: Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. We get these errors at all times, also when we can schedule on all slaves.
            Hide
            danielbeck Daniel Beck added a comment -

            Any interesting errors getting logged?

            Show
            danielbeck Daniel Beck added a comment - Any interesting errors getting logged?
            Show
            ki82 Christian Bremer added a comment - 200$ is up for grabs for this issue at: https://freedomsponsors.org/issue/598/will-not-start-builds-even-though-there-are-available-slots-on-executor

              People

              • Assignee:
                Unassigned
                Reporter:
                ki82 Christian Bremer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: