Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28604

Parallel step with node blocks for the same agent will create 2nd executor on single-executor slaves

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Create slaves with 1 executor, labels, and "Only build jobs with label restrictions matching this node." Create a build with a dozen long-running parallel steps that contain a node step which match the slave labels. Of about 10% of the builds executed, Workflow will spawn a 2nd executor on one of these slaves, as evidenced by a 2nd workspace folder, despite only having 1 executor available.

        Attachments

          Issue Links

            Activity

            Hide
            sumdumgai A C added a comment -

            Workflow 1.7 has made this problem worse. Now whenever a slave fails a long running process, Workflow tries to run all the queued parallel builds at once when the node is freed, and generates a new workspace for each one - so I end up with 6 workspaces on a slave machine that only has 1 executor.

            This is sometimes, but not always correlated with Workflow hanging on a batchfile step that failed.

            From the outside, this partly looks like a race condition in node assignment for parallel steps.

            Show
            sumdumgai A C added a comment - Workflow 1.7 has made this problem worse. Now whenever a slave fails a long running process, Workflow tries to run all the queued parallel builds at once when the node is freed, and generates a new workspace for each one - so I end up with 6 workspaces on a slave machine that only has 1 executor. This is sometimes, but not always correlated with Workflow hanging on a batchfile step that failed. From the outside, this partly looks like a race condition in node assignment for parallel steps.
            Hide
            jglick Jesse Glick added a comment -

            Sounds like a core bug. Any steps to reproduce?

            Show
            jglick Jesse Glick added a comment - Sounds like a core bug. Any steps to reproduce?
            Hide
            sumdumgai A C added a comment - - edited

            I don't have the time right now to write a full test case, but here's a cut-down version of the idiom I am trying to use, hopefully helpful:

            • 1 restricted master executor with no labels, 2 restricted slaves with labels and 1 executor each (restricted = only run jobs with matching labels)

            try {
            parallel {
            node( 'slave' )

            { <batch step that fails> }

            <repeat node( 'slave' ) several times>
            }
            } catch( Exception e ) {
            node( 'slave' )

            { <another batch step that fails> }

            throw e
            }

            Show
            sumdumgai A C added a comment - - edited I don't have the time right now to write a full test case, but here's a cut-down version of the idiom I am trying to use, hopefully helpful: 1 restricted master executor with no labels, 2 restricted slaves with labels and 1 executor each (restricted = only run jobs with matching labels) try { parallel { node( 'slave' ) { <batch step that fails> } <repeat node( 'slave' ) several times> } } catch( Exception e ) { node( 'slave' ) { <another batch step that fails> } throw e }
            Hide
            sumdumgai A C added a comment -

            After browsing the bug DB some more, one other potentially relevant note - these slaves do have a high latency connection to the master, on the order of many seconds to (rarely) a few minutes.

            Show
            sumdumgai A C added a comment - After browsing the bug DB some more, one other potentially relevant note - these slaves do have a high latency connection to the master, on the order of many seconds to (rarely) a few minutes.
            Hide
            jglick Jesse Glick added a comment -

            Might share some underlying cause with JENKINS-28759.

            Show
            jglick Jesse Glick added a comment - Might share some underlying cause with JENKINS-28759 .

              People

              • Assignee:
                Unassigned
                Reporter:
                sumdumgai A C
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: