Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

    Details

    • Type: Bug
    • Status: Reopened (View Workflow)
    • Priority: Blocker
    • Resolution: Unresolved
    • Labels:
      None
    • Environment:
      Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54
    • Similar Issues:

      Description

      I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:

      org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
      

      The way I'm parsing xml is:

      @NonCPS
      def parsePackage(packageName, packageVersion) {
          def packageFullName = "${packageName}.${packageVersion}"
        bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
        bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""
      
        def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
        def nuspecContent = readFile file:nuspecPath
        def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
        println nuspecXML.metadata.version
        
        def newXml = XmlUtil.serialize(nuspecXML)
        return newXml
      }
      

      It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

       

      I tried to replicate it directly in groovy doing 

      def xmldata = new File("Newtonsoft.Json.nuspec").text
      def pkg = new XmlSlurper().parseText(xmldata) 
      println pkg.metadata.version.text()
      

      But here the leading BOM characters are not passed into xmldata variable

       

      Attached example nuspec with BOM in it.

       

       

        Attachments

          Activity

          quas Jakub Pawlinski created issue -
          quas Jakub Pawlinski made changes -
          Field Original Value New Value
          Description The readFile step, when used inside a environment closure, whether top-level or in a stage, causes the following error:
          an exception which occurred:
          in field com.cloudbees.groovy.cps.impl.BlockScopeEnv.locals
          in object com.cloudbees.groovy.cps.impl.LoopBlockScopeEnv@29044815
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@25c9f135
          in field com.cloudbees.groovy.cps.impl.CallEnv.caller
          in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@307ab985
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@5a92c230
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@37a0a42f
          in field com.cloudbees.groovy.cps.impl.CallEnv.caller
          in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@184a6ff5
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@676c6c8d
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@19f01356
          in field com.cloudbees.groovy.cps.impl.CallEnv.caller
          in object com.cloudbees.groovy.cps.impl.ClosureCallEnv@74d1467b
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@4d098490
          in field com.cloudbees.groovy.cps.impl.ProxyEnv.parent
          in object com.cloudbees.groovy.cps.impl.BlockScopeEnv@28223d82
          in field com.cloudbees.groovy.cps.impl.CallEnv.caller
          in object com.cloudbees.groovy.cps.impl.FunctionCallEnv@6e27611b
          in field com.cloudbees.groovy.cps.Continuable.e
          in object org.jenkinsci.plugins.workflow.cps.SandboxContinuable@78ff9c41
          in field org.jenkinsci.plugins.workflow.cps.CpsThread.program
          in object org.jenkinsci.plugins.workflow.cps.CpsThread@7841b6fe
          in field org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.threads
          in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
          in object org.jenkinsci.plugins.workflow.cps.CpsThreadGroup@4d2d90ce
          Caused: java.io.NotSerializableException: java.util.TreeMap$Entry
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:860)
          at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
          at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
          at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
          at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
          at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
          at java.util.HashMap.internalWriteEntries(HashMap.java:1785)
          at java.util.HashMap.writeObject(HashMap.java:1362)
          at sun.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:967)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
          at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
          at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
          at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
          at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:344)
          at java.util.TreeMap.writeObject(TreeMap.java:2438)
          at sun.reflect.GeneratedMethodAccessor176.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:273)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:976)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
          at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
          at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
          at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:140)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:458)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:434)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:422)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:362)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
          at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:748)

          A test repo was created to replicate this.

          https://github.com/sflynn-dell/pipeline-test

          Branches:
               declarative-script - readFile is successful when used inside a script closure.
               declarative-env - readFile fails when used inside an environment enclosure.
          Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
          {code:java}
          org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
          {code}
          The way I'm parsing xml is:
          {code:java}
          @NonCPS
          def parsePackage(packageName, packageVersion) {
              def packageFullName = "${packageName}.${packageVersion}"
            bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
            bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""

            def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
            def nuspecContent = readFile file:nuspecPath
            def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
            println nuspecXML.metadata.version
            
            def newXml = XmlUtil.serialize(nuspecXML)
            return newXml
          }
          {code}
          It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

           

          I tried to replicate it directly in groovy doing 
          {code:java}
          def xmldata = new File("Newtonsoft.Json.nuspec").text
          def pkg = new XmlSlurper().parseText(xmldata)
          println pkg.metadata.version.text()
          {code}
          But here the leading BOM characters are not passed into xmldata variable

           

          Attached example nuspec with BOM in it.

           

           
          quas Jakub Pawlinski made changes -
          Attachment Newtonsoft.Json.nuspec [ 44656 ]
          quas Jakub Pawlinski made changes -
          Environment Jenkins 2.73.1 and Jenkins 2.81 Pipeline Groovy Plugin 2.40 Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54
          quas Jakub Pawlinski made changes -
          Description Using Jenkins ver. 2.121.2 I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
          {code:java}
          org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
          {code}
          The way I'm parsing xml is:
          {code:java}
          @NonCPS
          def parsePackage(packageName, packageVersion) {
              def packageFullName = "${packageName}.${packageVersion}"
            bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
            bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""

            def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
            def nuspecContent = readFile file:nuspecPath
            def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
            println nuspecXML.metadata.version
            
            def newXml = XmlUtil.serialize(nuspecXML)
            return newXml
          }
          {code}
          It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

           

          I tried to replicate it directly in groovy doing 
          {code:java}
          def xmldata = new File("Newtonsoft.Json.nuspec").text
          def pkg = new XmlSlurper().parseText(xmldata)
          println pkg.metadata.version.text()
          {code}
          But here the leading BOM characters are not passed into xmldata variable

           

          Attached example nuspec with BOM in it.

           

           
          I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:
          {code:java}
          org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
          {code}
          The way I'm parsing xml is:
          {code:java}
          @NonCPS
          def parsePackage(packageName, packageVersion) {
              def packageFullName = "${packageName}.${packageVersion}"
            bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
            bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""

            def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
            def nuspecContent = readFile file:nuspecPath
            def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
            println nuspecXML.metadata.version
            
            def newXml = XmlUtil.serialize(nuspecXML)
            return newXml
          }
          {code}
          It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

           

          I tried to replicate it directly in groovy doing 
          {code:java}
          def xmldata = new File("Newtonsoft.Json.nuspec").text
          def pkg = new XmlSlurper().parseText(xmldata)
          println pkg.metadata.version.text()
          {code}
          But here the leading BOM characters are not passed into xmldata variable

           

          Attached example nuspec with BOM in it.

           

           
          abayer Andrew Bayer made changes -
          Component/s workflow-basic-steps-plugin [ 21712 ]
          Component/s pipeline-model-definition-plugin [ 21706 ]
          Assignee Andrew Bayer [ abayer ]
          Hide
          svanoort Sam Van Oort added a comment -

          Jakub Pawlinski This is a known with the Unicode spec and the Java platform implementation of it, not Pipeline. In UTF-8 the BOM is neither needed nor suggested - since the BOM is essentially meaningless in UTF-8, Java transparently passes the BOM through.

          First I'd make sure to add the "encloding: 'UTF-8'" argument to your readFile step to ensure it reads as UTF-8. Then we do postprocessing to correct for nonstandard input.

          Some suggested solutions are available on StackOverflow.

          Personally, I'd do something like this to sanitize your input:

          /** These are UTF-8 BOM characters */
          private static String removeUTF8BOM(String s) {
              return s.replace("\uEFBBBF", "");
          }
          

          (might need to be \u FEFF, try it both ways).

          There's also code snippets out there that do a more efficient approach, which only considers the leading bytes of the String.

          Show
          svanoort Sam Van Oort added a comment - Jakub Pawlinski This is a known with the Unicode spec and the Java platform implementation of it, not Pipeline. In UTF-8 the BOM is neither needed nor suggested - since the BOM is essentially meaningless in UTF-8, Java transparently passes the BOM through. First I'd make sure to add the "encloding: 'UTF-8'" argument to your readFile step to ensure it reads as UTF-8. Then we do postprocessing to correct for nonstandard input. Some suggested solutions are available on StackOverflow . Personally, I'd do something like this to sanitize your input: /** These are UTF-8 BOM characters */ private static String removeUTF8BOM( String s) { return s.replace( "\uEFBBBF" , ""); } (might need to be \u FEFF, try it both ways). There's also code snippets out there that do a more efficient approach, which only considers the leading bytes of the String.
          Hide
          svanoort Sam Van Oort added a comment -

          This is due to a known problem with Java's implementation of the UTF-8 spec. Suggested an easy workaround in Pipeline code to solve the issue.

          Show
          svanoort Sam Van Oort added a comment - This is due to a known problem with Java's implementation of the UTF-8 spec. Suggested an easy workaround in Pipeline code to solve the issue.
          svanoort Sam Van Oort made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Resolution Not A Defect [ 7 ]
          Hide
          quas Jakub Pawlinski added a comment -

          Ok, but if its Java issue, why I could not replicate in locally using Groovy Version: 2.6.0-alpha-1 JVM: 1.8.0_111 Vendor: Oracle Corporation OS: Windows 10

          Show
          quas Jakub Pawlinski added a comment - Ok, but if its Java issue, why I could not replicate in locally using Groovy Version: 2.6.0-alpha-1 JVM: 1.8.0_111 Vendor: Oracle Corporation OS: Windows 10
          Hide
          ilatypov Ilguiz Latypov added a comment - - edited

          I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order.

          $ python -c 'u = b"\xEF\xBB\xBF".decode("utf-8"); print "%04X" % (ord(u[0]),)'
          FEFF
          

          Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode.

          public static CharSequence deBOM(CharSequence s) {
              if (s == null) {
                  return null
              } else if (s.length() == 0) {
                  return s
              } else if (s[0] == '\uFEFF') {
                  return s.drop(1)
              } else {
                  return s
              }
          }
          

          https://stackoverflow.com/questions/5406172/utf-8-without-bom

          Perhaps, newer XMLSlurper performs this santitation.

          Show
          ilatypov Ilguiz Latypov added a comment - - edited I guess Sam used vague wording.  It's the files that harbour the UTF-8-encoded BOM mark at the beginning, which is useless because UTF-8's bytewise storage does not depend on the architecture's byte order. $ python -c 'u = b "\xEF\xBB\xBF" .decode( "utf-8" ); print "%04X" % (ord(u[0]),)' FEFF Microsoft creates files with these useless but confusing 3 bytes at the beginning of its UTF-8-encoded files.  Now every program that reads such files needs to trim the Unicode BOM character at the beginning of the contents after decoding to Unicode. public static CharSequence deBOM(CharSequence s) { if (s == null ) { return null } else if (s.length() == 0) { return s } else if (s[0] == '\uFEFF' ) { return s.drop(1) } else { return s } } https://stackoverflow.com/questions/5406172/utf-8-without-bom Perhaps, newer XMLSlurper performs this santitation.
          Hide
          ilatypov Ilguiz Latypov added a comment - - edited

          This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it.

                    +------------------+----------+
                    | Leading sequence | Encoding |
                    +------------------+----------+
                    | FF FE 00 00      | UTF-32LE |
                    | 00 00 FE FF      | UTF-32BE |
                    | FF FE            | UTF-16LE |
                    | FE FF            | UTF-16BE |
                    | EF BB BF         | UTF-8    |
                    +------------------+----------+

          http://www.rfc-editor.org/rfc/rfc4329.txt

          So readFile needs a mode (or a special value for the encoding parameter) to sense the post-modern BOM and decode the rest of the contents accordingly.

          Show
          ilatypov Ilguiz Latypov added a comment - - edited This appears a post-modern BOM that is supposed to tell the encoding of the file before decoding it. +------------------+----------+ | Leading sequence | Encoding | +------------------+----------+ | FF FE 00 00 | UTF-32LE | | 00 00 FE FF | UTF-32BE | | FF FE | UTF-16LE | | FE FF | UTF-16BE | | EF BB BF | UTF-8 | +------------------+----------+ http://www.rfc-editor.org/rfc/rfc4329.txt So readFile needs a mode (or a special value for the encoding parameter) to sense the post-modern BOM and decode the rest of the contents accordingly.
          Hide
          quas Jakub Pawlinski added a comment -

          same issue with readCSV, possibly all other ways of reading files via jenkins. The issue with readCSV is more severe as I cannot step in between reading the content of the file and the content being processed to Commons CSV structure. Only way to do this is to readFile and parse it manually which makes readCSV (and other functionalities like that) redundant.

          I still don't understand why you claim its not jenkins but java issue while its not replicable even in newer groovy version.

          Show
          quas Jakub Pawlinski added a comment - same issue with readCSV, possibly all other ways of reading files via jenkins. The issue with readCSV is more severe as I cannot step in between reading the content of the file and the content being processed to Commons CSV structure. Only way to do this is to readFile and parse it manually which makes readCSV (and other functionalities like that) redundant. I still don't understand why you claim its not jenkins but java issue while its not replicable even in newer groovy version.
          Hide
          quas Jakub Pawlinski added a comment - - edited
          Show
          quas Jakub Pawlinski added a comment - - edited Affected functionalities: readCSV readJSON readManifest readMavenPom readProperties readYaml
          quas Jakub Pawlinski made changes -
          Resolution Not A Defect [ 7 ]
          Status Closed [ 6 ] Reopened [ 4 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              quas Jakub Pawlinski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: