Our team is also hoping for a solution to this, the setup we have is similar to what Shahaf Duenyas describes, starting with some sequential pre-work, then splitting into windows and linux build, then ideally further splitting several times down the pipeline (multiple tests that each is parameterized over OS and HW configurations and fully parallelizable).
Right now our pipeline takes three times as long as it should because we really like to have the user friendly visualization we get from Blue Ocean. This is of course not ideal because time to failure is longer that it should be and the dev cycle time suffers.
I think this issue should add support for any levels of parallelism, not hard code it for any arbitrarily magic number of parallelism.
The issue should be renamed to "Support nested parallel stages"
I also agree that the "pipeline_mockup.png" in
JENKINS-38442 looks like the natural way to render this and it should scale well. Rendering the X(1 for example) top level parallel stages different then the N-X inner levels, as I have seen proposed, seems like a hack tailored to some specific use case.