Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48344

Log files generated by Jenkins pipeline scripts are bloated

    Details

    • Similar Issues:
    • Released As:
      1.9

      Description

      We recently had one of our production Jenkins masters nearly run out of disk space due to an over-allocation of log data by just a couple of jobs running on our farm. In fact, 2 such builds consumed nearly a half a terabyte for their build logs alone! Further, closer examination of those log files revealed that the data therein was being excessively bloated by what appears to be some sort of markup or metadata attached to each line in the build log that looks something like this:

       

      ^[[8mha:////4OsXDMICFQEJVGu5QN07bZyJAnhssncpc0tH8m8uvnrSAAAAaB+LCAAAAAAAAP9b85aBtbiIwTG/KF0vKzUvOzOvODlTryCnNB3I0ivPL8pOy8kv18vKT9JLzs8rzs9J1QuHCgaV5jlDhPzyS1IZIICRiYGhoohBKqM0pTg/D64Hh8ICAFt0h+h/AAAA^[[0m[Pipeline] node

       

      To illustrate the problem I created a super trivial pipeline script as follows:

      node {
        stage('first') {
          echo "hello world"
        }
      }
      

      This simple example produced 5kb worth of build logs! To make matters worse, it appears as though the raw text (ie: excluding the markup) is also duplicated among several other .log files in the same folder as the main build log, causing even further bloat.

      I am creating this issue in the hopes that (a) someone can explain what this extra log metadata is in the main build log of a pipeline build and (b) someone can suggest some way to either eliminate this bloat and superfluous duplication, or to at least offer some way to detect large build logs and perhaps truncate, purge or even prevent them by causing the build to fail. Really anything other than filling up the Jenkins home folder would be preferable.

        Attachments

          Issue Links

            Activity

            leedega Kevin Phillips created issue -
            Hide
            leedega Kevin Phillips added a comment -

            Note: I did find the build log compression plugin found here which sounds encouraging, but given that it changes the actual format of the log file itself it risks breaking other plugins that use the log. I haven't tested it out with our current production system but it seems overly fragile / risky to me. Also, it doesn't address the duplication of the independent .log files either apparently. 

            Show
            leedega Kevin Phillips added a comment - Note: I did find the build log compression plugin found here  which sounds encouraging, but given that it changes the actual format of the log file itself it risks breaking other plugins that use the log. I haven't tested it out with our current production system but it seems overly fragile / risky to me. Also, it doesn't address the duplication of the independent .log files either apparently. 
            leedega Kevin Phillips made changes -
            Field Original Value New Value
            Comment [ Note: I did find the build log compression plugin found [here|https://plugins.jenkins.io/compress-buildlog] which sounds encouraging, but given that it changes the actual format of the log file itself it risks breaking other plugins that use the log. I haven't tested it out with our current production system but it seems overly fragile / risky to me. Also, it doesn't address the duplication of the independent .log files either apparently.  ]
            leedega Kevin Phillips made changes -
            Environment jenkins core 2.46.3
            pipeline v2.6
            Hide
            jamesdumay James Dumay added a comment - - edited

            Kevin Phillips what you are seeing in the log is known as a log annotation. You likely have some plugins installed that annotate log files, such as the ASCII color plugin, which are adding this data for each log line. Please review your plugins for any that annotate log files.

            If you generate a support bundle and post the plugins.txt here perhaps we can identify the issue.

            Show
            jamesdumay James Dumay added a comment - - edited Kevin Phillips what you are seeing in the log is known as a log annotation. You likely have some plugins installed that annotate log files, such as the ASCII color plugin, which are adding this data for each log line. Please review your plugins for any that annotate log files. If you generate a support bundle and post the plugins.txt here perhaps we can identify the issue.
            Hide
            leedega Kevin Phillips added a comment -

            I am going to try and isolate this behavior to a specific plugin, however in my sample test case I provided above I'm not actually making use of any third party plugins as part of the build and the logs are still bloated. If a plugin were to blame wouldn't they all be disabled in this case?

            Show
            leedega Kevin Phillips added a comment - I am going to try and isolate this behavior to a specific plugin, however in my sample test case I provided above I'm not actually making use of any third party plugins as part of the build and the logs are still bloated. If a plugin were to blame wouldn't they all be disabled in this case?
            Hide
            abayer Andrew Bayer added a comment -

            Kevin Phillips - if I remember correctly, it'd probably be one of the plugins listed here.

            Show
            abayer Andrew Bayer added a comment - Kevin Phillips - if I remember correctly, it'd probably be one of the plugins listed here .
            Hide
            leedega Kevin Phillips added a comment -

            Thanks everyone for the suggestions.

            So, in order to help isolate the cause of this problem I created a new Jenkins sandbox that was a mirror configuration of our production server, and I started installing plugins one by one to try and isolate the root cause of this issue.

            So, just to summarize, I installed Jenkins core v2.60.2 with no plugins selected for the initial installation (ie: a clean slate). Then I started by installing 'just' the pipeline-* plugins (ie: those required to create the trivial pipeline job described above) and their dependencies. I then ran a test build of the sample job presented above. I had expected to see the log file would be small and clean since there were no third party plugins installed at all on this server. However, to my surprise the first build of my sample job resulted in a bloated log file! So this suggests to me that something in the core set of pipeline dependencies is causing this bloat.

            For reference, here's a list of the plugins I have installed along with their versions (reminder: every single one of these plugins was required in order for the base 'pipeline' plugin to run):

            matrix-project:
              version: "1.11"
            script-security:
              version: "1.29.1"
            mailer:
              version: "1.20"
            scm-api:
              version: "2.1.1"
            docker-commons:
              version: "1.6"
            ssh-credentials:
              version: "1.13"
            credentials:
              version: "2.1.13"
            junit:
              version: "1.20"
            git:
              version: "3.3.2"
            durable-task:
              version: "1.14"
            git-client:
              version: "2.4.6"
            git-server:
              version: "1.7"
            cloudbees-folder:
              version: "6.0.4"
            workflow-basic-steps:
              version: "2.6"
            workflow-durable-task-step:
              version: "2.12"
            workflow-cps-global-lib:
              version: "2.8"
            workflow-step-api:
              version: "2.12"
            workflow-api:
              version: "2.18"
            authentication-tokens:
              version: "1.3"
            jquery-detached:
              version: "1.2.1"
            workflow-scm-step:
              version: "2.5"
            ace-editor:
              version: "1.1"
            pipeline-rest-api:
              version: "2.8"
            workflow-support:
              version: "2.14"
            workflow-cps:
              version: "2.36.1"
            workflow-job:
              version: "2.11.1"
            docker-workflow:
              version: "1.10"
            display-url-api:
              version: "2.0"
            pipeline-github-lib:
              version: "1.0"
            structs:
              version: "1.9"
            pipeline-milestone-step:
              version: "1.3.1"
            pipeline-build-step:
              version: "2.5.1"
            pipeline-input-step:
              version: "2.7"
            pipeline-stage-step:
              version: "2.2"
            pipeline-graph-analysis:
              version: "1.4"
            handlebars:
              version: "1.1.1"
            momentjs:
              version: "1.1.1"
            pipeline-stage-view:
              version: "2.8"
            branch-api:
              version: "2.0.10"
            workflow-multibranch:
              version: "2.16"
            icon-shim:
              version: "2.0.3"
            plain-credentials:
              version: "1.4"
            credentials-binding:
              version: "1.11"
            pipeline-utility-steps:
              version: "1.3.0"
            pipeline-model-declarative-agent:
              version: "1.1.1"
            pipeline-stage-tags-metadata:
              version: "1.1.8"
            pipeline-model-api:
              version: "1.1.8"
            pipeline-model-extensions:
              version: "1.1.8"
            pipeline-model-definition:
              version: "1.1.8"

             

            Show
            leedega Kevin Phillips added a comment - Thanks everyone for the suggestions. So, in order to help isolate the cause of this problem I created a new Jenkins sandbox that was a mirror configuration of our production server, and I started installing plugins one by one to try and isolate the root cause of this issue. So, just to summarize, I installed Jenkins core v2.60.2 with no plugins selected for the initial installation (ie: a clean slate). Then I started by installing 'just' the pipeline-* plugins (ie: those required to create the trivial pipeline job described above) and their dependencies. I then ran a test build of the sample job presented above. I had expected to see the log file would be small and clean since there were no third party plugins installed at all on this server. However, to my surprise the first build of my sample job resulted in a bloated log file! So this suggests to me that something in the core set of pipeline dependencies is causing this bloat. For reference, here's a list of the plugins I have installed along with their versions (reminder: every single one of these plugins was required in order for the base 'pipeline' plugin to run): matrix-project: version: "1.11" script-security: version: "1.29.1" mailer: version: "1.20" scm-api: version: "2.1.1" docker-commons: version: "1.6" ssh-credentials: version: "1.13" credentials: version: "2.1.13" junit: version: "1.20" git: version: "3.3.2" durable-task: version: "1.14" git-client: version: "2.4.6" git-server: version: "1.7" cloudbees-folder: version: "6.0.4" workflow-basic-steps: version: "2.6" workflow-durable-task-step: version: "2.12" workflow-cps-global-lib: version: "2.8" workflow-step-api: version: "2.12" workflow-api: version: "2.18" authentication-tokens: version: "1.3" jquery-detached: version: "1.2.1" workflow-scm-step: version: "2.5" ace-editor: version: "1.1" pipeline-rest-api: version: "2.8" workflow-support: version: "2.14" workflow-cps: version: "2.36.1" workflow-job: version: "2.11.1" docker-workflow: version: "1.10" display-url-api: version: "2.0" pipeline-github-lib: version: "1.0" structs: version: "1.9" pipeline-milestone-step: version: "1.3.1" pipeline-build-step: version: "2.5.1" pipeline-input-step: version: "2.7" pipeline-stage-step: version: "2.2" pipeline-graph-analysis: version: "1.4" handlebars: version: "1.1.1" momentjs: version: "1.1.1" pipeline-stage-view: version: "2.8" branch-api: version: "2.0.10" workflow-multibranch: version: "2.16" icon-shim: version: "2.0.3" plain-credentials: version: "1.4" credentials-binding: version: "1.11" pipeline-utility-steps: version: "1.3.0" pipeline-model-declarative-agent: version: "1.1.1" pipeline-stage-tags-metadata: version: "1.1.8" pipeline-model-api: version: "1.1.8" pipeline-model-extensions: version: "1.1.8" pipeline-model-definition: version: "1.1.8"  
            leedega Kevin Phillips made changes -
            Environment jenkins core 2.46.3
            pipeline v2.6
            jenkins core 2.60.2
            pipeline v2.6
            leedega Kevin Phillips made changes -
            Environment jenkins core 2.60.2
            pipeline v2.6
            jenkins core v2.43.3, v2.60.2, v2.89.2
            pipeline v2.6
            Hide
            leedega Kevin Phillips added a comment - - edited

            A quick update: our production servers are running core v2.46.3, as was our sandbox environment. I had set up a v2.60.2 core version a few months back to test upgrading. I also took the liberty of setting up another test environment using the latest core version, v2.89.2. In all 3 environments this problem is reproducible.

            Further, my test environments AND production environments were using the same plugin versions that I mentioned in my previous comment, however as a test I upgraded all of them to the latest versions and the problem still persists. In fact the problem got measurably worse! My test build produced a 2.6kb 'log' file before the upgrade, and it produced a 2.8kb 'log' file afterwards (ie: this doesn't include the duplicate "numbered" log files produced as well)! I haven't audited the files in great detail to see where the additional bloat came from, but the results are very concerning.

            GIven these results I think it's pretty safe to say that the underlyng problem is somewhere in the Jenkins core or the Pipeline plugins (or one of their transitive dependencies) which, in turn, makes it difficult if not impossible to work around without an appropriate fix in the implementation.

            Hopefully someone can look into this sooner rather than later because this problem is affecting our production systems, and it is of high severity because when it occurs it crashes our Jenkins masters taking the entire build farm offline (ie: due to disk space filling up on the master).

            Any input from anyone would be greatly appreciated.

            Show
            leedega Kevin Phillips added a comment - - edited A quick update: our production servers are running core v2.46.3, as was our sandbox environment. I had set up a v2.60.2 core version a few months back to test upgrading. I also took the liberty of setting up another test environment using the latest core version, v2.89.2. In all 3 environments this problem is reproducible. Further, my test environments AND production environments were using the same plugin versions that I mentioned in my previous comment, however as a test I upgraded all of them to the latest versions and the problem still persists. In fact the problem got measurably worse! My test build produced a 2.6kb 'log' file before the upgrade, and it produced a 2.8kb 'log' file afterwards (ie: this doesn't include the duplicate "numbered" log files produced as well)! I haven't audited the files in great detail to see where the additional bloat came from, but the results are very concerning. GIven these results I think it's pretty safe to say that the underlyng problem is somewhere in the Jenkins core or the Pipeline plugins (or one of their transitive dependencies) which, in turn, makes it difficult if not impossible to work around without an appropriate fix in the implementation. Hopefully someone can look into this sooner rather than later because this problem is affecting our production systems, and it is of high severity because when it occurs it crashes our Jenkins masters taking the entire build farm offline (ie: due to disk space filling up on the master). Any input from anyone would be greatly appreciated.
            leedega Kevin Phillips made changes -
            Attachment log [ 40709 ]
            Hide
            leedega Kevin Phillips added a comment - - edited

            I've attached a sample log file produced by my trivial build described above for review in case it helps.

            Show
            leedega Kevin Phillips added a comment - - edited I've attached a sample log file produced by my trivial build described above for review in case it helps.
            leedega Kevin Phillips made changes -
            Attachment support_2017-12-15_14.zip [ 40710 ]
            Hide
            leedega Kevin Phillips added a comment - - edited

            Also attached a support bundle to this issue as requested previously for review. This bundle was created from my latest test enviroment using the latest Jenkins core version and with all plugins mentioned above upgraded to their latest versions (with the addition of the Support Core Plugin of course).

            Show
            leedega Kevin Phillips added a comment - - edited Also attached a support bundle to this issue as requested previously for review. This bundle was created from my latest test enviroment using the latest Jenkins core version and with all plugins mentioned above upgraded to their latest versions (with the addition of the Support Core Plugin of course).
            Hide
            leedega Kevin Phillips added a comment -

            Just a quick follow up to see if there has been any headway / investigation on this front. This problem is adversely affecting our production servers, and the problem is escalating as more and more teams are adopting the Jenkins pipeline plugins.

            Show
            leedega Kevin Phillips added a comment - Just a quick follow up to see if there has been any headway / investigation on this front. This problem is adversely affecting our production servers, and the problem is escalating as more and more teams are adopting the Jenkins pipeline plugins.
            Hide
            markewaite Mark Waite added a comment -

            I see the same data in some (but not all) of the jobs that I run from my "lts-with-plugins docker image". Oddly, I don't see it with all my pipeline jobs, just some of them. I haven't identified any pattern for which jobs include the extra annotation and which do not.

            Unfortunately, I have no hints about the source of the extra information, only that I can see it on some of my Pipeline jobs, and not on others.

            Show
            markewaite Mark Waite added a comment - I see the same data in some (but not all) of the jobs that I run from my " lts-with-plugins docker image ". Oddly, I don't see it with all my pipeline jobs, just some of them. I haven't identified any pattern for which jobs include the extra annotation and which do not. Unfortunately, I have no hints about the source of the extra information, only that I can see it on some of my Pipeline jobs, and not on others.
            Hide
            leedega Kevin Phillips added a comment -

            Again, another ping back to see what the status is on this defect. This problem / feature / bug is crippling our build farm at the moment and it may very well force us to scrap the use of the Jenkins pipeline completely.

            Even if something could be done in the short term to, say, stop duplicate log files from being created for each build that would go a long way to easing the pain for the short term (ie: a 3-10GB log file is bad enough, but it gets much worse when the log file is duplicated).

            Show
            leedega Kevin Phillips added a comment - Again, another ping back to see what the status is on this defect. This problem / feature / bug is crippling our build farm at the moment and it may very well force us to scrap the use of the Jenkins pipeline completely. Even if something could be done in the short term to, say, stop duplicate log files from being created for each build that would go a long way to easing the pain for the short term (ie: a 3-10GB log file is bad enough, but it gets much worse when the log file is duplicated).
            Hide
            svanoort Sam Van Oort added a comment - - edited

            Kevin Phillips I am sorry to hear that you're having such difficulty.  As others have mentioned, the extra binary gibberish is usually ConsoleAnnotations –- generally things like TimeStamper plugin are responsible, although Pipeline adds one as well.   Unfortunately, the format is Base64-encoded binary in a specialized format and sometimes encrypted (with reason) so it is hard to actually determine what it contains.  Some suggestions that should help:

            1. Limit the number of lines in your logs – these annotations should generally be quite short and applied generally per-line.  One way to do this is to pass longer build output to a dedicated logfile and archive that as a build artifact rather than printing it all to console.  Another thing to do is to limit the number of Pipeline steps run, since there's trivial logging for each.  
            2. System-level solution: enable filesystem compression for the folders holding builds.  This may be the easiest win for you.
            3. It won't help for logging itself, but if you install the latest Pipeline plugins, there are some optimizations available that reduce disk IO by a lot – https://jenkins.io/doc/book/pipeline/scaling-pipeline/
            4. In the intermediate-term, we are planning to execute https://issues.jenkins-ci.org/browse/JENKINS-38381 which will dramatically change the way logging works for pipeline.  It will avoid double-logging at the very least.  It may include some items that help with this, depending – I've considered implementing support for one of the faster compression algorithms as a passthrough filter, but that may reduce usability of the log for other roles.

            I'm not clear why your builds are resulting in such large data file – are you logging large amounts of content to start with?  The overhead of this should be on the order of a few hundred bytes per log line.  That's a bit obnoxious but I've never heard of it resulting in GB of logs on its own.

            How much plaintext log output do you have?

            Show
            svanoort Sam Van Oort added a comment - - edited Kevin Phillips  I am sorry to hear that you're having such difficulty.  As others have mentioned, the extra binary gibberish is usually ConsoleAnnotations –- generally things like TimeStamper plugin are responsible, although Pipeline adds one as well.   Unfortunately, the format is Base64-encoded binary in a specialized format and sometimes encrypted (with reason) so it is hard to actually determine what it contains.  Some suggestions that should help: Limit the number of lines in your logs – these annotations should generally be quite short and applied generally per-line.  One way to do this is to pass longer build output to a dedicated logfile and archive that as a build artifact rather than printing it all to console.  Another thing to do is to limit the number of Pipeline steps run, since there's trivial logging for each.   System-level solution: enable filesystem compression for the folders holding builds.  This may be the easiest win for you. It won't help for logging itself, but if you install the latest Pipeline plugins, there are some optimizations available that reduce disk IO by a lot –  https://jenkins.io/doc/book/pipeline/scaling-pipeline/ In the intermediate-term, we are planning to execute https://issues.jenkins-ci.org/browse/JENKINS-38381  which will dramatically change the way logging works for pipeline.  It will avoid double-logging at the very least .  It may include some items that help with this, depending – I've considered implementing support for one of the faster compression algorithms as a passthrough filter, but that may reduce usability of the log for other roles. I'm not clear why your builds are resulting in such large data file – are you logging large amounts of content to start with?  The overhead of this should be on the order of a few hundred bytes per log line.  That's a bit obnoxious but I've never heard of it resulting in GB of logs on its own. How much plaintext log output do you have?
            Hide
            leedega Kevin Phillips added a comment -

            Sam Van Oort Thank you for the response.

            Yeah, the installed plugins was the first thing I eliminated as being a possible cause. As mentioned in one of my ealier comments I had set up a test sandbox with nothing but Jenkins core and the pipeline plugin (along with it's transitive dependencies) so no customizations at all and no third party stuff, and the problem is reproducible. So it is definitely pipeline related.

            Our next plan of attack was going to be your suggestion #1 - redirecting output to a file on disk and archive it. We are reluctant to do this because in many of our use cases we have long running unit tests running under Jenkin (ie: ones that take days to complete) and it is helpful to see the current progress in real time as they are running. If test output is redirected to a file, that is much harder if not impossible to do (maybe users could try finding the file using the workspace links on the Jenkins dashboard, but that would be quite invasive / disruptive for their workflows).

            Your second suggestion might work in the short-to-medium term, so long as adding compression to the file system doesn't adversely affect the performance of the Jenkins masters too much. We'd need to do some performance testing to see if this is practical for us or not.

            As for your third suggestion, we are planning an upgrade of the Jenkins core and all installed plugins next month so we'll have that in place sooner or later anyway. But as you said, there does not seem to be anything therein that would address this particular problem.

            I am interested in your 4th suggestion for sure. Having super large log files is definitely a problem, but the fact that they are duplicated automatically doubles the impact of this problem so this would be of great value in any situation. Looking forward to seeing a fix for that.

            Thanks again for the suggestions. I will investigate the workarounds you have suggested, but imo this "bloated log file" issue is something that needs attention regardless of the workarounds. Perhaps you could clarify what sorts of markup are being injected here by the Jenkins pipeline and why it is so important that it justifies such an large increase in log data? If I had some extra information as to why our servers are filling up, then perhaps I could provide some rationale to those affected on our teams. It might set some minds at ease.

            Show
            leedega Kevin Phillips added a comment - Sam Van Oort Thank you for the response. Yeah, the installed plugins was the first thing I eliminated as being a possible cause. As mentioned in one of my ealier comments I had set up a test sandbox with nothing but Jenkins core and the pipeline plugin (along with it's transitive dependencies) so no customizations at all and no third party stuff, and the problem is reproducible. So it is definitely pipeline related. Our next plan of attack was going to be your suggestion #1 - redirecting output to a file on disk and archive it. We are reluctant to do this because in many of our use cases we have long running unit tests running under Jenkin (ie: ones that take days to complete) and it is helpful to see the current progress in real time as they are running. If test output is redirected to a file, that is much harder if not impossible to do (maybe users could try finding the file using the workspace links on the Jenkins dashboard, but that would be quite invasive / disruptive for their workflows). Your second suggestion might work in the short-to-medium term, so long as adding compression to the file system doesn't adversely affect the performance of the Jenkins masters too much. We'd need to do some performance testing to see if this is practical for us or not. As for your third suggestion, we are planning an upgrade of the Jenkins core and all installed plugins next month so we'll have that in place sooner or later anyway. But as you said, there does not seem to be anything therein that would address this particular problem. I am interested in your 4th suggestion for sure. Having super large log files is definitely a problem, but the fact that they are duplicated automatically doubles the impact of this problem so this would be of great value in any situation. Looking forward to seeing a fix for that. Thanks again for the suggestions. I will investigate the workarounds you have suggested, but imo this "bloated log file" issue is something that needs attention regardless of the workarounds. Perhaps you could clarify what sorts of markup are being injected here by the Jenkins pipeline and why it is so important that it justifies such an large increase in log data? If I had some extra information as to why our servers are filling up, then perhaps I could provide some rationale to those affected on our teams. It might set some minds at ease.
            Hide
            leedega Kevin Phillips added a comment - - edited

            Sam Van Oort I wanted to answer your last couple of questions separately from my response to your suggested workarounds so as to not confuse the issue.

            We are building software at a large scale, which requires builds that do extensive build-test-publish operations. Some of these build operations take hours, others take days. Further, in order to easily monitor the progress and status of these various builds we rely on detailed log information being displayed in real time on the Jenkins dashboard. This requires an extensive amount of verbose output to be created from our builds to provide progress information, state information, debugging information, etc. To put it into context, it's not uncommon for us to have logs for individual builds with millions of lines in them.

            Until our migration to pipeline, these log files consume megabytes of storage - with maybe some of the larger logs being upwards of 100MB in size. After migrating the builds to pipeline these files are now gigabytes in size, and are being duplicated for each build. So, from an end user perspective, that's an increase in storage requirements of like 60x (ie: a 100MB log -> 3GB of log data times 2). That's pretty significant.

            You had mentioned that the "bloat" should be a few hundred bytes per line, which it does appear to be. At a glance this sounds insignificant, but when you consider that the bloat applies to each line in the log output, and that the markup tends to be significantly larger than the raw data it is describing, that seems prohibitive to me. At least on average in our case anyway. I mean, the markup appears on blank links in the output which increases the line size from 2 bytes to hundreds, which is at least a 100x increase! Even in the average cases, like this small line produced by the built-in pipeline tasks, the bloat is still somewhere around 10x the size of the raw log output:

            ^[[8mha:////4FIxIpb1zBghtLKbAdX+TaHiHbTrY49ofoRS4QuhQA4kAAAAaB+LCAAAAAAAAP9b85aBtbiIwTG/KF0vKzUvOzOvODlTryCnNB3I0ivPL8pOy8kv18vKT9JLzs8rzs9J1QuHCgaV5jlD hPzyS1IZIICRiYGhoohBKqM0pTg/D64Hh8ICAFt0h+h/AAAA^[[0m[Pipeline] End of Pipeline

            vs

            [Pipeline] End of Pipeline

            That should be unfathomable really. I can't imagine what metadata could be so important to warrant such a bloat in the output data. I do realize that this assessment is coming from a somewhat naiive Jenkins user who isn't familiar with the internals, so I'm hoping you may be able to shed some light on that aspect of the problem. Some rationale as to why this markup is so important and why it needs to be so large would be great.

            Show
            leedega Kevin Phillips added a comment - - edited Sam Van Oort I wanted to answer your last couple of questions separately from my response to your suggested workarounds so as to not confuse the issue. We are building software at a large scale, which requires builds that do extensive build-test-publish operations. Some of these build operations take hours, others take days. Further, in order to easily monitor the progress and status of these various builds we rely on detailed log information being displayed in real time on the Jenkins dashboard. This requires an extensive amount of verbose output to be created from our builds to provide progress information, state information, debugging information, etc. To put it into context, it's not uncommon for us to have logs for individual builds with millions of lines in them. Until our migration to pipeline, these log files consume megabytes of storage - with maybe some of the larger logs being upwards of 100MB in size. After migrating the builds to pipeline these files are now gigabytes in size, and are being duplicated for each build. So, from an end user perspective, that's an increase in storage requirements of like 60x (ie: a 100MB log -> 3GB of log data times 2). That's pretty significant. You had mentioned that the "bloat" should be a few hundred bytes per line, which it does appear to be. At a glance this sounds insignificant, but when you consider that the bloat applies to each line in the log output, and that the markup tends to be significantly larger than the raw data it is describing, that seems prohibitive to me. At least on average in our case anyway. I mean, the markup appears on blank links in the output which increases the line size from 2 bytes to hundreds, which is at least a 100x increase! Even in the average cases, like this small line produced by the built-in pipeline tasks, the bloat is still somewhere around 10x the size of the raw log output: ^[[8mha: ////4FIxIpb1zBghtLKbAdX+TaHiHbTrY49ofoRS4QuhQA4kAAAAaB+LCAAAAAAAAP9b85aBtbiIwTG/KF0vKzUvOzOvODlTryCnNB3I0ivPL8pOy8kv18vKT9JLzs8rzs9J1QuHCgaV5jlD hPzyS1IZIICRiYGhoohBKqM0pTg/D64Hh8ICAFt0h+h/AAAA^[[0m[Pipeline] End of Pipeline vs [Pipeline] End of Pipeline That should be unfathomable really. I can't imagine what metadata could be so important to warrant such a bloat in the output data. I do realize that this assessment is coming from a somewhat naiive Jenkins user who isn't familiar with the internals, so I'm hoping you may be able to shed some light on that aspect of the problem. Some rationale as to why this markup is so important and why it needs to be so large would be great.
            Hide
            svanoort Sam Van Oort added a comment -

            Kevin Phillips The optimizations to storage access won't reduce the disk space consumed, but they will reduce the performance impacts on the master somewhat because there will be less IO activity with Pipelines in general. That means you won't see the systemic slowdowns.

            Here's a more general-purpose solution to pair with the others: 

            1. Set up your pipelines to discard old builds (see https://cinhtau.github.io/2017/07/20/discard-old-builds-jenkins/ for an example), so you only keep, say the last 20 builds per job.  This limits overall disk space used. 
            2. If you need to retain logs long-term AND see them while builds run, fetch the consoleText from the master (HTTP request), compress it, and store it somewhere outside of Jenkins.  I've seen people just publish an on-master directory via HTTP (security implications apply), or use an external store (Elastic Search, etc).   This does come with a bit of a performance hit because fetching the entire consoleText will read that binary log again. 
              1. If you just need output of a complex build step and not the entire pipeline, you can tee, compress, and publish that directly from the build agent.  This will be much lighter than fetching consoleText. 

            Until our migration to pipeline, these log files consume megabytes of storage - with maybe some of the larger logs being upwards of 100MB in size

            That's... not really the best idea in general, for Jenkins.  It's not structured to be a general-purpose store for very large logs, so I'd strongly consider using a dedicated piece of software if you need that.  Feed the logs from mega-steps from build agents directly to software designed for bulk log ingestion (and perhaps analysis/visualization).

            The implementation of logging in Jenkins in general is tightly coupled to line-by-line operations, via LineTransformationOutputStream instances that insert binary ConsoleNotes to every line of output, which are converted into markup for display to users to get more usable output.  This is... well, it's kind of horrible.  There were reasons for it (before my time), but I wish it weren't done that way – but it's also something that would be rather difficult to alter since it's quite embedded.

            Show
            svanoort Sam Van Oort added a comment - Kevin Phillips  The optimizations to storage access won't reduce the disk space consumed, but they will reduce the performance impacts on the master somewhat because there will be less IO activity with Pipelines in general. That means you won't see the systemic slowdowns. Here's a more general-purpose solution to pair with the others:  Set up your pipelines to discard old builds (see https://cinhtau.github.io/2017/07/20/discard-old-builds-jenkins/  for an example), so you only keep, say the last 20 builds per job.  This limits overall disk space used.  If you need to retain logs long-term AND see them while builds run, fetch the consoleText from the master (HTTP request), compress it, and store it somewhere outside of Jenkins.  I've seen people just publish an on-master directory via HTTP (security implications apply), or use an external store (Elastic Search, etc).   This does come with a bit of a performance hit because fetching the entire consoleText will read that binary log again.  If you just need output of a complex build step and not the entire pipeline, you can tee, compress, and publish that directly from the build agent.  This will be much lighter than fetching consoleText.  Until our migration to pipeline, these log files consume megabytes of storage - with maybe some of the larger logs being upwards of 100MB in size That's... not really the best idea in general, for Jenkins.  It's not structured to be a general-purpose store for very large logs, so I'd strongly consider using a dedicated piece of software if you need that.  Feed the logs from mega-steps from build agents directly to software  designed for bulk log ingestion (and perhaps analysis/visualization). The implementation of logging in Jenkins in general is tightly coupled to line-by-line operations, via LineTransformationOutputStream instances that insert binary ConsoleNotes to every line of output, which are converted into markup for display to users to get more usable output.  This is... well, it's kind of horrible.  There were reasons for it (before my time), but I wish it weren't done that way – but it's also something that would be rather difficult to alter since it's quite embedded.
            Hide
            jglick Jesse Glick added a comment -

            The fact that output is stored in two files—log, as with traditional job types, as well as 123.log per step—is a longstanding problem, to be corrected in JENKINS-38381. So that accounts for a factor of two.

            (JENKINS-38381 also offers the capability for the single copy of the log file to be stored externally. You may use a bunch of disk space on some log server, but $JENKINS_HOME is unaffected.)

            Then there are console notes, also used in traditional job types but used more widely in Pipeline builds, and consisting of Base64-encoded serialized data embedded in an ANSI escape sequence. These do tend to consume more space than the plain text of the line they appear on (and got bigger after a security fix, alas). However these should only be printed on a tiny fraction of the lines in a typical large build. For example, if you run a sh step, that [Pipeline] sh line will be annotated, but the potentially thousands of lines of output from the process should not be. So typically the contribution to log file size would be minor.

            Since you claim to be seeing a far larger bloat (in each copy of the log file), and it sounds like you are seeing annotations on every line, something is going wrong. My guess is that you are using the timestamper block-scoped step, which in Pipeline mode does annotate every interior line. (For traditional jobs, unless overridden by a system property, it saves timestamp information to a side file. Also potentially big, though probably not as big.) Until we can think of a way to deal with that, for example by adjusting the Timestamper plugin to skip annotations on most lines if they are being spat out in rapid succession (few users care about millisecond precision in the middle of a 100kLOC log file!), you should try turning this wrapper off and see if that helps.

            (And of course if we do have an external log service integration with JENKINS-38381, the service may do its own timestamping anyway.)

            Show
            jglick Jesse Glick added a comment - The fact that output is stored in two files— log , as with traditional job types, as well as 123.log per step—is a longstanding problem, to be corrected in JENKINS-38381 . So that accounts for a factor of two. ( JENKINS-38381 also offers the capability for the single copy of the log file to be stored externally. You may use a bunch of disk space on some log server, but $JENKINS_HOME is unaffected.) Then there are console notes, also used in traditional job types but used more widely in Pipeline builds, and consisting of Base64-encoded serialized data embedded in an ANSI escape sequence. These do tend to consume more space than the plain text of the line they appear on (and got bigger after a security fix , alas). However these should only be printed on a tiny fraction of the lines in a typical large build. For example, if you run a sh step, that [Pipeline] sh line will be annotated, but the potentially thousands of lines of output from the process should not be. So typically the contribution to log file size would be minor. Since you claim to be seeing a far larger bloat (in each copy of the log file), and it sounds like you are seeing annotations on every line, something is going wrong. My guess is that you are using the timestamper block-scoped step, which in Pipeline mode does annotate every interior line. (For traditional jobs, unless overridden by a system property, it saves timestamp information to a side file. Also potentially big, though probably not as big.) Until we can think of a way to deal with that, for example by adjusting the Timestamper plugin to skip annotations on most lines if they are being spat out in rapid succession (few users care about millisecond precision in the middle of a 100kLOC log file!), you should try turning this wrapper off and see if that helps. (And of course if we do have an external log service integration with JENKINS-38381 , the service may do its own timestamping anyway.)
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-38381 [ JENKINS-38381 ]
            jbochenski Jakub Bochenski made changes -
            Link This issue is blocked by JENKINS-44195 [ JENKINS-44195 ]
            Hide
            jglick Jesse Glick added a comment -
            Show
            jglick Jesse Glick added a comment - Code reference
            Hide
            jglick Jesse Glick added a comment -

            Demo; the junk goes away if you remove the timestamps step:

            @Test
            public void lines() throws Exception {
                WorkflowJob p = r.createProject(WorkflowJob.class, "p");
                p.setDefinition(new CpsFlowDefinition("timestamps {node {sh 'yes | head -1000'}}", true));
                WorkflowRun b = r.buildAndAssertSuccess(p);
                IOUtils.copy(b.getLogInputStream(), System.out);
            }
            
            Show
            jglick Jesse Glick added a comment - Demo; the junk goes away if you remove the timestamps step: @Test public void lines() throws Exception { WorkflowJob p = r.createProject(WorkflowJob.class, "p" ); p.setDefinition( new CpsFlowDefinition( "timestamps {node {sh 'yes | head -1000' }}" , true )); WorkflowRun b = r.buildAndAssertSuccess(p); IOUtils.copy(b.getLogInputStream(), System .out); }
            jglick Jesse Glick made changes -
            Issue Type Improvement [ 4 ] Bug [ 1 ]
            Component/s timestamper-plugin [ 15749 ]
            Component/s pipeline [ 21692 ]
            Labels performance pipeline
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue relates to JENKINS-51229 [ JENKINS-51229 ]
            Hide
            jglick Jesse Glick added a comment -

            The API defined in JENKINS-45693 would allow there to be a system-wide option to add a simple timestamp prefix to every log line, which would be rendered nicely at least in the classic UI without ConsoleNote overhead.

            Show
            jglick Jesse Glick added a comment - The API defined in JENKINS-45693 would allow there to be a system-wide option to add a simple timestamp prefix to every log line, which would be rendered nicely at least in the classic UI without ConsoleNote overhead.
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-45693 [ JENKINS-45693 ]
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            jglick Jesse Glick added a comment -

            PR 25 is simple enough and basically solves this. Using ISO-8601 dates in plain text, which is reasonably compact, precise, and human-readable in case you are not using classic UI rendering.

            Show
            jglick Jesse Glick added a comment - PR 25 is simple enough and basically solves this. Using ISO-8601 dates in plain text, which is reasonably compact, precise, and human-readable in case you are not using classic UI rendering.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 25 (Web Link)" [ 21848 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-54081 [ JENKINS-54081 ]
            Hide
            jglick Jesse Glick added a comment -

            For anyone interested in trying this, use this build.

            Show
            jglick Jesse Glick added a comment - For anyone interested in trying this, use this build .
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-54133 [ JENKINS-54133 ]
            Hide
            medianick Nick Jones added a comment - - edited

            Running the custom Timestamper plugin build in Jesse Glick's previous comment, on our test Jenkins instance, with a timestamps block in a Scripted Pipeline job but with the global "Enabled for all Pipeline builds" checkbox checked, I'm seeing 6681 "deprecated*.log" files – each the same size as the "log" file (2328MB) – created on our master node by a single build, all within a few minutes of each other. This is consuming 14GB+ of space for this one build, which would be catastrophic for disk space if we applied the same updates to our production Jenkins instance (where there are hundreds of jobs building hundreds of times per day).

            This timestamps block is still in place because our production Jenkins system hasn't had its plugins changed, and uses the same Jenkinsfile for this particular job, and with the hundreds of jobs we have in place, coordinating the removal of a timestamps block in GitHub across all these Jenkinsfiles with the upgrade of the Timestamper (and/or other plugins) would be exceptionally difficult. I'm assuming (perhaps wrongly) that these "deprecated" files are being generated by some interaction between this block and the newer plugin(s) in this system, but perhaps it's something else.

            Is the only way forward to remove all the timestamps blocks or options from all our Jenkinsfiles first, then upgrading these plugins? Or is something else going on?

            Show
            medianick Nick Jones added a comment - - edited Running the custom Timestamper plugin build in Jesse Glick 's previous comment, on our test Jenkins instance, with a timestamps block in a Scripted Pipeline job but with the global "Enabled for all Pipeline builds" checkbox checked, I'm seeing 6681 "deprecated*.log" files – each the same size as the "log" file (2328MB) – created on our master node by a single build , all within a few minutes of each other. This is consuming 14GB+ of space for this one build, which would be catastrophic for disk space if we applied the same updates to our production Jenkins instance (where there are hundreds of jobs building hundreds of times per day). This timestamps block is still in place because our production Jenkins system hasn't had its plugins changed, and uses the same Jenkinsfile for this particular job, and with the hundreds of jobs we have in place, coordinating the removal of a timestamps block in GitHub across all these Jenkinsfiles with the upgrade of the Timestamper (and/or other plugins) would be exceptionally difficult. I'm assuming (perhaps wrongly) that these "deprecated" files are being generated by some interaction between this block and the newer plugin(s) in this system, but perhaps it's something else. Is the only way forward to remove all the timestamps blocks or options from all our Jenkinsfiles first, then upgrading these plugins? Or is something else going on?
            Hide
            jglick Jesse Glick added a comment -

            Nick Jones no, with the global setting enabled the timestamps step/option does nothing besides print a warning that it is doing nothing. Your issue is some instance of JENKINS-54128, with the exact cause visible by checking the stack trace in your system log.

            Show
            jglick Jesse Glick added a comment - Nick Jones no, with the global setting enabled the timestamps step/option does nothing besides print a warning that it is doing nothing. Your issue is some instance of JENKINS-54128 , with the exact cause visible by checking the stack trace in your system log.
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-55257 [ JENKINS-55257 ]
            jglick Jesse Glick made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Released As 1.9
            jglick Jesse Glick made changes -
            Link This issue is blocking INFRA-2028 [ INFRA-2028 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-45661 [ JENKINS-45661 ]
            Hide
            drulli Ulli Hafner added a comment -

            Jesse Glick I just came around this issue due to JENKINS-56484. Before Timestamper 1.9 I used ConsoleNote::removeNotes to get rid of additional console decorations when parsing warnings. This does obviously not work anymore. Is there a general way to remove the timestamper plugin output from the console log?

            Show
            drulli Ulli Hafner added a comment - Jesse Glick I just came around this issue due to JENKINS-56484 . Before Timestamper 1.9 I used ConsoleNote::removeNotes to get rid of additional console decorations when parsing warnings. This does obviously not work anymore. Is there a general way to remove the timestamper plugin output from the console log?
            drulli Ulli Hafner made changes -
            Link This issue causes JENKINS-56484 [ JENKINS-56484 ]
            Hide
            jglick Jesse Glick added a comment -

            Better to discuss there.

            Show
            jglick Jesse Glick added a comment - Better to discuss there.
            basil Basil Crow made changes -
            Link This issue relates to JENKINS-58102 [ JENKINS-58102 ]
            Hide
            timur87 Tim Oz added a comment -

            Please, help

            I have the same issue on Jenkins ver. 2.176.1, Timestamper 1.10 and AnsiColor 0.6.2 plugins.

            Show
            timur87 Tim Oz added a comment - Please, help I have the same issue on Jenkins ver. 2.176.1, Timestamper 1.10 and AnsiColor 0.6.2 plugins.

              People

              • Assignee:
                jglick Jesse Glick
                Reporter:
                leedega Kevin Phillips
              • Votes:
                11 Vote for this issue
                Watchers:
                29 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: