Details

    • Sprint:
      Pipeline - October, Pipeline - December
    • Similar Issues:

      Description

      The current design of SimpleXStreamFlowNodeStorage and LogActionImpl, using workflow/$id.xml and $id.log, was considered the minimum necessary for a working 1.0 release, not a serious implementation. It has two major problems:

      • When there are a lot of steps, as in JENKINS-30055, many small files are created, which is bad for I/O performance.
      • When there is a large amount of output, WorkflowRun.copyLogs must duplicate it all to log, doubling disk space requirements per build.

      It would be better to keep all flow node information in one file. (Perhaps build.xml itself. In principle we could avoid loading non-head nodes with a historical build record, though I believe CpsFlowExecution currently winds up loading them all anyway. Need to check.)

      More importantly, there should be a single log file for the build. LogActionImpl should deprecated in favor of an implementation that simply stores a rangeset of offsets into that file. When parallel blocks are producing concurrent output, the single log file will be a bit jumbled (probably still human-readable in most cases), but the rangesets will keep track of what output came from where. The final output produced by WorkflowRun will still be processed to split at line boundaries, add in thread labels, etc. (TBD how and whether JENKINS-30777 could be supported in this mode.)

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment -

            Capturing notes from discussion (and scattered across various JIRAs/PR comments/etc):

            • It would be advantageous to store an inline index of block start/stop nodes, allowing for resolution of node to block via range queries
            Show
            svanoort Sam Van Oort added a comment - Capturing notes from discussion (and scattered across various JIRAs/PR comments/etc): It would be advantageous to store an inline index of block start/stop nodes, allowing for resolution of node to block via range queries
            Hide
            svanoort Sam Van Oort added a comment -

            To give one order-of-magnitude estimate for scaling, for the CloudBees internal CI, mean and median iota across the last successful run of each pipeline job are 86 & 52 respectively.

            Runs with several hundred nodes are not uncommon.

            As long as we take some measures to simplify serialization/deserialization in storage to reduce the CPU hit, batching nodes in chunks of 100 would be reasonable (1-2 kB per node, an entire flow can be loaded/saved in one 50-200 kB I/O).

            Show
            svanoort Sam Van Oort added a comment - To give one order-of-magnitude estimate for scaling, for the CloudBees internal CI, mean and median iota across the last successful run of each pipeline job are 86 & 52 respectively. Runs with several hundred nodes are not uncommon. As long as we take some measures to simplify serialization/deserialization in storage to reduce the CPU hit, batching nodes in chunks of 100 would be reasonable (1-2 kB per node, an entire flow can be loaded/saved in one 50-200 kB I/O).
            Hide
            svanoort Sam Van Oort added a comment -

            Note for later implementation of log consolidation:

            WorkflowRun.copyLogs + logNodeMessage (writing to a WorkflowConsoleLogger)

            IIUC there's a StreamBuildListener attached to the WorkflowRun which gets log outputs, this is sliced and copied into a new log file for each FlowNode periodically (at the least upon completion, via copyLogs), which becomes part of a LogActionImpl.

            If we wanted we could write directly to an output file, and then use an alternate LogAction version that gives offsets within this file. Interleaved streams from parallels are an issue though (probably we need to keep separate log streams for each, numbered by parallel # or iota and branch order, I.E. 1-1.log, 1-2.log, 2-1.log, 2-2.log). Would need to create new StreamBuildListeners in this case.

            Possible case also for something like NIO w/ channels (since we're basically redirecting inputs from one channel into multiplexing (reactor pattern?).

            Show
            svanoort Sam Van Oort added a comment - Note for later implementation of log consolidation: WorkflowRun.copyLogs + logNodeMessage (writing to a WorkflowConsoleLogger) IIUC there's a StreamBuildListener attached to the WorkflowRun which gets log outputs, this is sliced and copied into a new log file for each FlowNode periodically (at the least upon completion, via copyLogs), which becomes part of a LogActionImpl. If we wanted we could write directly to an output file, and then use an alternate LogAction version that gives offsets within this file. Interleaved streams from parallels are an issue though (probably we need to keep separate log streams for each, numbered by parallel # or iota and branch order, I.E. 1-1.log, 1-2.log, 2-1.log, 2-2.log). Would need to create new StreamBuildListeners in this case. Possible case also for something like NIO w/ channels (since we're basically redirecting inputs from one channel into multiplexing ( reactor pattern? ).
            Hide
            jglick Jesse Glick added a comment -

            Splitting log portion into JENKINS-38381 since I think it is independent.

            Show
            jglick Jesse Glick added a comment - Splitting log portion into JENKINS-38381 since I think it is independent.
            Hide
            svanoort Sam Van Oort added a comment -

            Most of this is covered by JENKINS-47173 with the bulk storage plus some other enhancements done since then. 

            At least for now, I'm closing this until and unless we feel a need for more complex storage that will incrementally persist FlowNodes – not convinced that's worthwhile though, and the bulk persistence allows a whole wad of other optimizations.

            Show
            svanoort Sam Van Oort added a comment - Most of this is covered by JENKINS-47173 with the bulk storage plus some other enhancements done since then.  At least for now, I'm closing this until and unless we feel a need for more complex storage that will incrementally persist FlowNodes – not convinced that's worthwhile though, and the bulk persistence allows a whole wad of other optimizations.

              People

              • Assignee:
                Unassigned
                Reporter:
                jglick Jesse Glick
              • Votes:
                3 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: