Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61452

Illegal base64 character in FileLogStorage$1.writeHtmlTo(FileLogStorage.java:203)

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Labels:
      None
    • Environment:
      workflow-api-plugin 2.36

      Jenkins 2.222.3, workflow-api-plugin 2.40, splunk-devops 1.9.3, splunk-devops-extend 1.9.3
    • Similar Issues:

      Description

      sporadic console error

      java.lang.IllegalArgumentException: Illegal base64 character 5b
      	at java.util.Base64$Decoder.decode0(Base64.java:714)
      	at java.util.Base64$Decoder.decode(Base64.java:526)
      	at hudson.util.UnbufferedBase64InputStream.read(UnbufferedBase64InputStream.java:41)
      	at hudson.util.UnbufferedBase64InputStream.read(UnbufferedBase64InputStream.java:53)
      	at java.io.DataInputStream.readFully(DataInputStream.java:195)
      	at java.io.DataInputStream.readFully(DataInputStream.java:169)
      	at hudson.console.ConsoleNote.readFrom(ConsoleNote.java:248)
      	at hudson.console.ConsoleAnnotationOutputStream.eol(ConsoleAnnotationOutputStream.java:111)
      	at hudson.console.LineTransformationOutputStream.eol(LineTransformationOutputStream.java:60)
      	at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:56)
      	at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
      	at org.jenkinsci.plugins.workflow.log.FileLogStorage$1$1.write(FileLogStorage.java:238)
      	at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
      	at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:89)
      	at org.kohsuke.stapler.framework.io.LargeText$HeadMark.moveTo(LargeText.java:314)
      	at org.kohsuke.stapler.framework.io.LargeText.writeLogTo(LargeText.java:229)
      	at hudson.console.AnnotatedLargeText.writeRawLogTo(AnnotatedLargeText.java:168)
      	at org.jenkinsci.plugins.workflow.log.FileLogStorage$1.writeHtmlTo(FileLogStorage.java:203)
      	at hudson.console.AnnotatedLargeText.writeLogTo(AnnotatedLargeText.java:147)
      	at org.kohsuke.stapler.framework.io.LargeText.doProgressText(LargeText.java:264)
      	at hudson.console.AnnotatedLargeText.doProgressiveHtml(AnnotatedLargeText.java:95)
      

        Attachments

          Issue Links

            Activity

            Hide
            fengxx Ted Xiao added a comment -

            The console page will go back to normal if the job completes, either normally or aborted. probably a caching bug in the FileLogStorage

            Show
            fengxx Ted Xiao added a comment - The console page will go back to normal if the job completes, either normally or aborted. probably a caching bug in the FileLogStorage
            Hide
            mikeryan_tan Mike Ryan added a comment -

            This began happening in my environment (Jenkins 2.222.3, workflow-api-plugin 2.40, splunk-devops 1.9.3, splunk-devops-extend 1.9.3) after installing the splunk-devops-extend 1.9.3 plugin. 

             

            I've done a bit of debugging on this, and it appears that log lines within a run's console log can be corrupted. ConsoleNote.java attempts to decode base64 data sitting between an ANSI escape PREAMBLE and POSTAMBLE. I get "Illegal base64 character" exception when a log line contains the PREAMBLE but not the POSTAMBLE.

            Like I mentioned, this only started happening after we installed the "splunk-devops-extend" plugin in our environment, which allowed us to start forwarding pipeline step logs to Splunk.

            I can find instances of the bad log lines by greping the original console log (on the jenkins server), looking for lines that contain the preamble, but not the postamble.

            cat some/build/console/log | grep '\[8mha' |grep -v '\[0m'|wc -l 

            .

            Here's an example of what I see in the logs. Please understand that I had to replace the actual base64 encoded string with "base64data" ---- I don't know exactly what is in there, so I don't want to expose anything.

            ESC[8mha:////4[2020-06-02T13:57:07.191Z] 09:57:06.854 stuff is logged here 
            base64database64database64database64database64database64database64database64data+j[2020-06-02T13:57:07.196Z] 09:57:06.856 more stuff is logged here 
            base64database64database64dataESC[0m[Pipeline] // timeout
            

             

            Show
            mikeryan_tan Mike Ryan added a comment - This began happening in my environment (Jenkins 2.222.3, workflow-api-plugin 2.40, splunk-devops 1.9.3, splunk-devops-extend 1.9.3) after installing the splunk-devops-extend 1.9.3 plugin.    I've done a bit of debugging on this, and it appears that log lines within a run's console log can be corrupted. ConsoleNote.java attempts to decode base64 data sitting between an ANSI escape PREAMBLE and POSTAMBLE . I get "Illegal base64 character" exception when a log line contains the PREAMBLE but not the POSTAMBLE. Like I mentioned, this only started happening after we installed the "splunk-devops-extend" plugin in our environment, which allowed us to start forwarding pipeline step logs to Splunk. I can find instances of the bad log lines by greping the original console log (on the jenkins server), looking for lines that contain the preamble, but not the postamble. cat some/build/console/log | grep '\[8mha' |grep -v '\[0m'|wc -l . Here's an example of what I see in the logs. Please understand that I had to replace the actual base64 encoded string with "base64data" ---- I don't know exactly what is in there, so I don't want to expose anything. ESC[8mha:////4[2020-06-02T13:57:07.191Z] 09:57:06.854 stuff is logged here base64database64database64database64database64database64database64database64data+j[2020-06-02T13:57:07.196Z] 09:57:06.856 more stuff is logged here base64database64database64dataESC[0m[Pipeline] // timeout  
            Hide
            fengxx Ted Xiao added a comment -

            Mike Ryan I think you can reproduce the issue just with workflow-api-plugin 2.40, https://github.com/jenkinsci/workflow-api-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/log/DelayBufferedOutputStream.java  may not flush data in-time and caused partial base64 data written to disk. 

            splunk-devops-extend depends on workflow-api-plugin , I guess when you install the plugin, Jenkins probably upgraded workflow-api-plugin to latest version.

             

            Show
            fengxx Ted Xiao added a comment - Mike Ryan I think you can reproduce the issue just with workflow-api-plugin 2.40,  https://github.com/jenkinsci/workflow-api-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/log/DelayBufferedOutputStream.java   may not flush data in-time and caused partial base64 data written to disk.  splunk-devops-extend depends on workflow-api-plugin , I guess when you install the plugin, Jenkins probably upgraded workflow-api-plugin to latest version.  
            Hide
            mikeryan_tan Mike Ryan added a comment -

            No, we have definitely had workflow-api-plug-in 2.40 installed for some time, as it that was released in February. splunk-devops-extend was the only plugin that we had installed on May 29th, and we started seeing the exceptions shortly after the restart.

            Perhaps forwarding console logs to splunk introduces enough of a delay into the system to make the buffer flushing issue present more often?

            Show
            mikeryan_tan Mike Ryan added a comment - No, we have definitely had workflow-api-plug-in 2.40 installed for some time, as it that was released in February. splunk-devops-extend was the only plugin that we had installed on May 29th, and we started seeing the exceptions shortly after the restart. Perhaps forwarding console logs to splunk introduces enough of a delay into the system to make the buffer flushing issue present more often?
            Hide
            fengxx Ted Xiao added a comment -

            I can easily reproduce the issue without any splunk plugin installed.
            Assume you have 2 jobs, one is trigger job, and the other one is downstream job called my-downstream

            // downstream job  my-downstream config

            properties([
             parameters([string(defaultValue: '1', description: '', name: 'counter', trim: false)])
            ])
            echo "${env.counter}"
            sleep 600 

            // trigger job config

            import hudson.console.ModelHyperlinkNote
            
            def kickJobs(count){
            	def runs=[:]
               	for(int i=0; i<count; i++){
               	    def jobId = i+1
            		def quietPeriod = 11*jobId.intdiv(9) + jobId%9
            		runs["run_$i"] = {
            		   echo "running $jobId"
            	       def userJob=build job: "my-downstream", parameters:[new StringParameterValue("counter", "job_$jobId")],
            	       quietPeriod: quietPeriod
            	       def consoleMsg = ModelHyperlinkNote.encodeTo("/${userJob.rawBuild.url}console", "Console")
            	       echo consoleMsg
            		}
                }
                parallel runs
                return true
            }
            
            node{
                timestamps{
                    ansiColor{
                      kickJobs(1000)
                    }
                }
            }
            

            Please first run the downstream job manually to update job property so it can be run concurrently by the trigger job, then start the trigger job and wait 1-2 minutes, then you will see the error on console

            the offending line in log looks like

            ^[[8mha:^[[8mha:////4OoLEwBdjfr/tkaLP4ecvUmMRmZlK////4NgeI/uDMeiCxyag5umRs/gNaD0jg/WoUHqu9mv3ESxOAAAAqh+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAypEgZF/az8JH2X1KTSdDArt1I3Jb88r7ikKDUxV9/EX0cYK7AAV8Tp3+zAAAAqh+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83FgfAL+zp+LWAAAA^[[0mDebug
            
            Show
            fengxx Ted Xiao added a comment - I can easily reproduce the issue without any splunk plugin installed. Assume you have 2 jobs, one is trigger job, and the other one is downstream job called  my-downstream // downstream job  my-downstream config properties([ parameters([string(defaultValue: '1' , description: '', name: ' counter', trim: false )]) ]) echo "${env.counter}" sleep 600 // trigger job config import hudson.console.ModelHyperlinkNote def kickJobs(count){ def runs=[:] for ( int i=0; i<count; i++){ def jobId = i+1 def quietPeriod = 11*jobId.intdiv(9) + jobId%9 runs[ "run_$i" ] = { echo "running $jobId" def userJob=build job: "my-downstream" , parameters:[ new StringParameterValue( "counter" , "job_$jobId" )], quietPeriod: quietPeriod def consoleMsg = ModelHyperlinkNote.encodeTo( "/${userJob.rawBuild.url}console" , "Console" ) echo consoleMsg } } parallel runs return true } node{ timestamps{ ansiColor{ kickJobs(1000) } } } Please first run the downstream job manually to update job property so it can be run concurrently by the trigger job, then start the trigger job and wait 1-2 minutes, then you will see the error on console the offending line in log looks like ^[[8mha:^[[8mha: ////4OoLEwBdjfr/tkaLP4ecvUmMRmZlK////4NgeI/uDMeiCxyag5umRs/gNaD0jg/WoUHqu9mv3ESxOAAAAqh+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAypEgZF/az8JH2X1KTSdDArt1I3Jb88r7ikKDUxV9/EX0cYK7AAV8Tp3+zAAAAqh+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83FgfAL+zp+LWAAAA^[[0mDebug
            Hide
            fengxx Ted Xiao added a comment -

            Jesse Glick any suggestion ?

            Show
            fengxx Ted Xiao added a comment - Jesse Glick any suggestion ?
            Hide
            jglick Jesse Glick added a comment -

            Not sure offhand. I have a dim memory of some cases where plugin code printing annotations needed to synchronize on the PrintStream in order to avoid printing part of a line, as in this example. That should not affect the echo step which uses a single call to PrintStream.println(String), but I can imagine something in com.splunk.splunkjenkins.console violating this rule.

            Show
            jglick Jesse Glick added a comment - Not sure offhand. I have a dim memory of some cases where plugin code printing annotations needed to synchronize on the PrintStream in order to avoid printing part of a line, as in this example . That should not affect the echo step which uses a single call to PrintStream.println(String) , but I can imagine something in com.splunk.splunkjenkins.console violating this rule.
            Hide
            mikeryan_tan Mike Ryan added a comment -

            I don't know if it sheds any light, but I found the following in a build slave console log. The build was affected by the base64 decoding issue.

            Jun 09, 2020 1:40:58 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
            WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.pipeline.common.Utils$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            Jun 09, 2020 1:40:58 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
            WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.pipeline.scripted.steps.conan.RunCommandStep$Execution$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            Jun 09, 2020 1:41:04 PM com.splunk.splunkjenkins.console.SplunkTaskListenerFactory flushLog
            SEVERE: flush log error
            java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong.
            	at jenkins.model.Jenkins.get(Jenkins.java:779)
            	at jenkins.model.Jenkins.getActiveInstance(Jenkins.java:791)
            	at com.splunk.splunkjenkins.SplunkJenkinsInstallation.get(SplunkJenkinsInstallation.java:106)
            	at com.splunk.splunkjenkins.utils.SplunkLogService.sendBatch(SplunkLogService.java:162)
            	at com.splunk.splunkjenkins.console.SplunkTaskListenerFactory.flushLog(SplunkTaskListenerFactory.java:85)
            	at com.splunk.splunkjenkins.console.SplunkTaskListenerFactory.enqueue(SplunkTaskListenerFactory.java:74)
            	at com.splunk.splunkjenkins.console.LabelConsoleLineStream.eol(LabelConsoleLineStream.java:52)
            	at com.splunk.splunkjenkins.console.LabelConsoleLineStream.write(LabelConsoleLineStream.java:34)
            	at java.io.FilterOutputStream.write(Unknown Source)
            	at java.io.PrintStream.write(Unknown Source)
            	at hudson.util.StreamCopyThread.run(StreamCopyThread.java:61)
            
            Jun 09, 2020 1:41:04 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
            WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.util.ExtractorUtils$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            Jun 09, 2020 1:46:06 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
            WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            
            Show
            mikeryan_tan Mike Ryan added a comment - I don't know if it sheds any light, but I found the following in a build slave console log. The build was affected by the base64 decoding issue. Jun 09, 2020 1:40:58 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.pipeline.common.Utils$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/ Jun 09, 2020 1:40:58 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.pipeline.scripted.steps.conan.RunCommandStep$Execution$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/ Jun 09, 2020 1:41:04 PM com.splunk.splunkjenkins.console.SplunkTaskListenerFactory flushLog SEVERE: flush log error java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong. at jenkins.model.Jenkins.get(Jenkins.java:779) at jenkins.model.Jenkins.getActiveInstance(Jenkins.java:791) at com.splunk.splunkjenkins.SplunkJenkinsInstallation.get(SplunkJenkinsInstallation.java:106) at com.splunk.splunkjenkins.utils.SplunkLogService.sendBatch(SplunkLogService.java:162) at com.splunk.splunkjenkins.console.SplunkTaskListenerFactory.flushLog(SplunkTaskListenerFactory.java:85) at com.splunk.splunkjenkins.console.SplunkTaskListenerFactory.enqueue(SplunkTaskListenerFactory.java:74) at com.splunk.splunkjenkins.console.LabelConsoleLineStream.eol(LabelConsoleLineStream.java:52) at com.splunk.splunkjenkins.console.LabelConsoleLineStream.write(LabelConsoleLineStream.java:34) at java.io.FilterOutputStream.write(Unknown Source) at java.io.PrintStream.write(Unknown Source) at hudson.util.StreamCopyThread.run(StreamCopyThread.java:61) Jun 09, 2020 1:41:04 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jfrog.hudson.util.ExtractorUtils$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/ Jun 09, 2020 1:46:06 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            Hide
            jglick Jesse Glick added a comment -

            You can indeed not call Jenkins.get from an agent JVM; that is clearly a bug in the Splunk plugin. Whether it is the root cause of this issue I am not sure, but given that the error interrupted a call to a method eol, it seems plausible that this led to incomplete annotation being written to the log stream.

            Show
            jglick Jesse Glick added a comment - You can indeed not call Jenkins.get from an agent JVM; that is clearly a bug in the Splunk plugin. Whether it is the root cause of this issue I am not sure, but given that the error interrupted a call to a method eol , it seems plausible that this led to incomplete annotation being written to the log stream.
            Hide
            fengxx Ted Xiao added a comment -

            Mike Ryan I think it is a separate issue, would you mind if we tracking it in another ticket?
            I wrote most of the Splunk plugin code, SplunkTaskListenerFactory is supposed to be loaded in master jvm only, either Jenkins was in restart phase or a bug in splunk plugin.

            just FYI, the reproduce step (trigger job config) doesn't require splunk plugin installed.

            Show
            fengxx Ted Xiao added a comment - Mike Ryan I think it is a separate issue, would you mind if we tracking it in another ticket? I wrote most of the Splunk plugin code, SplunkTaskListenerFactory is supposed to be loaded in master jvm only, either Jenkins was in restart phase or a bug in splunk plugin. just FYI, the reproduce step (trigger job config) doesn't require splunk plugin installed.
            Hide
            jglick Jesse Glick added a comment -

            the reproduce step (trigger job config) doesn't require splunk plugin installed

            Understood, but this is not obviously the same issue—could be a bug in Pipeline infrastructure, with echo, with ansicolor, with timestamps, etc.

            Show
            jglick Jesse Glick added a comment - the reproduce step (trigger job config) doesn't require splunk plugin installed Understood, but this is not obviously the same issue—could be a bug in Pipeline infrastructure, with echo , with ansicolor , with timestamps , etc.
            Hide
            fengxx Ted Xiao added a comment -

            Agrees that ansicolor/timestamps may have bugs and should be fixed.
            What I observed is that workflow job tolerated the issue before, e.g. workflow-api 2.34 + workflow-job 2.25 + ansicolor 0.5.0 works fine (at least /console page is fine), but it throws error on workflow-api 2.39 + workflow-job 2.37 + ansicolor 0.5.0. Hope it is not too much to ask if we catch error and ignore the bad line in workflow-api

            Show
            fengxx Ted Xiao added a comment - Agrees that ansicolor/timestamps may have bugs and should be fixed. What I observed is that workflow job tolerated the issue before, e.g. workflow-api 2.34 + workflow-job 2.25 + ansicolor 0.5.0 works fine (at least /console page is fine), but it throws error on workflow-api 2.39 + workflow-job 2.37 + ansicolor 0.5.0. Hope it is not too much to ask if we catch error and ignore the bad line in workflow-api
            Hide
            fengxx Ted Xiao added a comment -

            Mike Ryan splunk JVM issue will be tracked in JENKINS-62663

            Show
            fengxx Ted Xiao added a comment - Mike Ryan splunk JVM issue will be tracked in JENKINS-62663
            Hide
            jglick Jesse Glick added a comment -

            Have not taken the time yet to reproduce and analyze the situation in the debugger, but

            ignore the bad line in workflow-api

            AFAICT this aspect of the bug is in Jenkins core: malformed Base64 results in an IllegalArgumentException, rather than being suppressed quietly in ConsoleAnnotationOutputStream due to a null return value or IOException from ConsoleNote.readFrom. That would be a bug in UnbufferedBase64InputStream and/or ConsoleNote.

            The deeper cause is of course that the annotations are mangled to begin with, which as previously mentioned might have a similar root cause as previously fixed synchronization bugs.

            Show
            jglick Jesse Glick added a comment - Have not taken the time yet to reproduce and analyze the situation in the debugger, but ignore the bad line in workflow-api AFAICT this aspect of the bug is in Jenkins core: malformed Base64 results in an IllegalArgumentException , rather than being suppressed quietly in ConsoleAnnotationOutputStream due to a null return value or IOException from ConsoleNote.readFrom . That would be a bug in UnbufferedBase64InputStream and/or ConsoleNote . The deeper cause is of course that the annotations are mangled to begin with, which as previously mentioned might have a similar root cause as previously fixed synchronization bugs.
            Hide
            mikeryan_tan Mike Ryan added a comment -

            Just a heads up: the act of viewing a corrupt log file produced by this bug results in Jenkins leaking a file descriptor: JENKINS-62985

            Show
            mikeryan_tan Mike Ryan added a comment - Just a heads up: the act of viewing a corrupt log file produced by this bug results in Jenkins leaking a file descriptor: JENKINS-62985

              People

              • Assignee:
                Unassigned
                Reporter:
                fengxx Ted Xiao
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: