Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53901

Using readFile does not handle UTF-8 with BOM files

    Details

    • Type: Bug
    • Status: Reopened (View Workflow)
    • Priority: Blocker
    • Resolution: Unresolved
    • Labels:
      None
    • Environment:
      Jenkins 2.121.2 and Jenkins 2.81 Pipeline Groovy Plugin 2.54
    • Similar Issues:

      Description

      I'm extracting xml file (nuspec) from some nuget packages and trying to parse it. In most cases it works fine, but in some the xml was written using UTF-8 with BOM encoding, and then parser gets upset and reports:

      org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
      

      The way I'm parsing xml is:

      @NonCPS
      def parsePackage(packageName, packageVersion) {
          def packageFullName = "${packageName}.${packageVersion}"
        bat """curl -L https://www.nuget.org/api/v2/package/${packageName}/${packageVersion} -o ${packageFullName}.nupkg"""
        bat """unzip ${packageFullName}.nupkg -d ${packageFullName}"""
      
        def nuspecPath = """${packageFullName}\\${packageName}.nuspec"""
        def nuspecContent = readFile file:nuspecPath
        def nuspecXML = new XmlSlurper( false, false ).parseText(nuspecContent)
        println nuspecXML.metadata.version
        
        def newXml = XmlUtil.serialize(nuspecXML)
        return newXml
      }
      

      It looks like readFile is not supporting UTF-8 with BOM as it is passing leading BOM characters into returned string.

       

      I tried to replicate it directly in groovy doing 

      def xmldata = new File("Newtonsoft.Json.nuspec").text
      def pkg = new XmlSlurper().parseText(xmldata) 
      println pkg.metadata.version.text()
      

      But here the leading BOM characters are not passed into xmldata variable

       

      Attached example nuspec with BOM in it.

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              quas Jakub Pawlinski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: