Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26020

Will not start builds even though there are available slots on executor

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Incomplete
    • Component/s: core
    • Labels:
      None
    • Environment:
      LTS 1.580.1
    • Similar Issues:

      Description

      Sometimes our nodes won't be able to start new builds even though there are free slots available.

      A workaround for the slaves is to disconnect/connect the slave and it will start to schedule builds again.

      I have observed that when this happens for a slave the slave has fewer threads ongoing than an idle slave.

      Attaching thread dumps when this happens and after doing an disconnect/connect.

      We have seen this issue both on Windows(jlnp) slaves and linux(ssh) slaves as well as on the master node which is running linux.

        Attachments

          Issue Links

            Activity

            ki82 Christian Bremer created issue -
            Show
            ki82 Christian Bremer added a comment - 200$ is up for grabs for this issue at: https://freedomsponsors.org/issue/598/will-not-start-builds-even-though-there-are-available-slots-on-executor
            Hide
            danielbeck Daniel Beck added a comment -

            Any interesting errors getting logged?

            Show
            danielbeck Daniel Beck added a comment - Any interesting errors getting logged?
            Hide
            ki82 Christian Bremer added a comment -

            We get ~5000 JnlpSlaveHandshake errors per hour:

            Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            We get these errors at all times, also when we can schedule on all slaves.

            Show
            ki82 Christian Bremer added a comment - We get ~5000 JnlpSlaveHandshake errors per hour: Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. We get these errors at all times, also when we can schedule on all slaves.
            Hide
            ki82 Christian Bremer added a comment -

            Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.

            Show
            ki82 Christian Bremer added a comment - Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.

            Show
            oleg_nenashev Oleg Nenashev added a comment - > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.
            rtyler R. Tyler Croy made changes -
            Field Original Value New Value
            Workflow JNJira [ 160008 ] JNJira + In-Review [ 180215 ]
            Hide
            oleg_nenashev Oleg Nenashev added a comment - - edited

            Windows service issue should be fixed by JENKINS-39231. I do not see anything else we can diagnose here

            Show
            oleg_nenashev Oleg Nenashev added a comment - - edited Windows service issue should be fixed by JENKINS-39231 . I do not see anything else we can diagnose here
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue is related to JENKINS-39231 [ JENKINS-39231 ]
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

            Show
            oleg_nenashev Oleg Nenashev added a comment - I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info
            oleg_nenashev Oleg Nenashev made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Incomplete [ 4 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                ki82 Christian Bremer
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: