Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53030

Intermittent hudson.util.IOException2 + org.xml.sax.SAXParseException

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Environment:
    • Similar Issues:

      Description

      I am working on a build + test + deployment script for multiple platforms. So far the script is still mostly a dummy with the actual work still left to fill in. Nevertheless, when running the script I get intermittent (about one in three or four runs) failures caused by always the same two exceptions:

       

      org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/Temp.ScriptTest/branches/branches-nodenesting.skpm19/builds/109/changelog2.xml; lineNumber: 1; columnNumber: 1; Premature end of file.
       at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
       at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
       at org.apache.commons.digester.Digester.parse(Digester.java:1871)
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76)
      Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/Temp.ScriptTest/branches/branches-nodenesting.skpm19/builds/109/changelog2.xml
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80)
       at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1024)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143)
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1232)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:127)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
       at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
       at hudson.security.ACL.impersonate(ACL.java:290)
       at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)

       

      The file involved is always changelog2.xml.

      The script I am running is this:

       

      // --------------------------------------------------------------------------
      // Project definitions used in the declarative pipeline below
      // --------------------------------------------------------------------------
      def maintainers = "${env.MAIL_VGIMPLE}"
          // these people will be notified when the build fails; comma separated
      def platforms = util.StringToArray(env.CONAN_PLATFORMS)
          // alternative: ['win32', 'x64', ...]
      def upstreamjobs = "BuildSystem.CMake.all"
          // these jobs will trigger this job; comma separated
      def parallelbuild = true
          // set to true to execute builds parallel;
      def crossplatformbuild = true
          // use cross platform builds where possible (i.e. Linux ARM builds will
          // be carried out on Linux PC nodes)
      
      // --------------------------------------------------------------------------
      // Diagnostic output and preparations
      // --------------------------------------------------------------------------
      echo "**** Building for the following platforms:"
      print platforms
      
      builds = [:]
      builds.failFast = true
      
      stashNameArtifacts = "artifacts." + env.BUILD_NUMBER + "."
      stashNameTests = "tests." + env.BUILD_NUMBER + "."
      
      // --------------------------------------------------------------------------
      // The build and test step
      // --------------------------------------------------------------------------
      def BuildAndTestStep(platform) {
          def isCrossPlatformBuild = util.IsCrossPlatformBuild(platform)
          try {        testplatform = platform.startsWith('build.') ? platform - 'build.' : platform
              node (platform) {
                  checkout scm
                  echo "--------> build step for platform " + platform
                  echo "--------> stash " + stashNameArtifacts + testplatform
                  if (isCrossPlatformBuild) {
                      echo "--------> stash " + stashNameTests + testplatform
                  } else {
                      echo "--------> execute tests directly on the node where the build ran"
                  }
              }
              if (isCrossPlatformBuild) {
                  node (testplatform) {
                      echo "--------> unstash " + stashNameTests + testplatform
                      echo "--------> execute cross platform test"
                  }
              }
          } catch(Exception e) {
              echo "**** An error occurred on build platform " + platform
              print e
              throw e
          }
      }
      
      // --------------------------------------------------------------------------
      // Pipeline Declaration
      // --------------------------------------------------------------------------
      pipeline {
          agent none
          triggers {
              pollSCM('@hourly')
              upstream(upstreamProjects: upstreamjobs, threshold: hudson.model.Result.SUCCESS)
          }
          stages {
              // ------------------------------------------------------------------
              // Build and Test step
              // ------------------------------------------------------------------
              stage ('build & test') {
                  steps {
                      script {
                          // now execute the builds as determined - either parallel
                          // or sequentially
                          prefix = crossplatformbuild ? "build." : ""
                          if (parallelbuild) {
                              for (int i = 0; i < platforms.size(); ++i) {
                                  def plt = prefix + platforms[i]
                                  builds[plt] = { -> BuildAndTestStep(plt) }
                              }
                              parallel builds
                          } else {
                              for (int i = 0; i < platforms.size(); ++i) {
                                  BuildAndTestStep(prefix + platforms[i])
                              }
                          }
                      }
                  }
              }
              // ------------------------------------------------------------------
              // Deployment step
              // ------------------------------------------------------------------
              stage ('deploy') {
                  steps {
                      script {
                          for (int i = 0; i < platforms.size(); ++i) {
                              node { // actual node does not matter!
                                  echo "--------> unstash " + stashNameArtifacts + platforms[i]
                                  echo "--------> install unstashed artifacts to either cvdev2 or conan"
                              }
                          }
                      }
                  }
              }
          }
          post {
              failure {
                  echo "*** Build Regression ***"
                  script {
                      mail (to: "${env.MAIL_VGIMPLE}",
                      subject: "Build failed in Jenkins: ${JOB_NAME} #${BUILD_NUMBER}",
                      body: 'See ' + currentBuild.absoluteUrl)
                  }
              }
              fixed {
                  echo "*** Build Fixed ***"
                  script {
                      mail (to: "${env.MAIL_VGIMPLE}",
                      subject: "Jenkins build is back to normal: ${JOB_NAME} #${BUILD_NUMBER}",
                      body: 'See ' + currentBuild.absoluteUrl)
                  }
              }
          }
      }

      A typical output log is attached (file Log114.txt).

      Looking at the failed builds it looks like in most cases the (generally slower) ARM-based test nodes are the ones where the exception is raised. The exceptions never occur when executing the script sequentially (i.e. when setting 'parallelbuild' to false and the 'parallel' block is never hit).

      My next move will be to try and filter out these particular exceptions and see if I have a chance at completing the build regardless but an approach where this is not necessary would of course be my preference.

       

        Attachments

          Issue Links

            Activity

            Hide
            vgimple Volker Gimple added a comment -

            After doing some more testing with more finely grained and selective try/catch blocks I can add the following details:

            • If it does occur, then the IOException2 always occurs during the scm checkout call in BuildAndTestStep()
            • I cannot confirm my earlier impression that the exception is more likely to happen for one of the ARM platforms (in fact the scm step is usually not executed on one of those as these are typically used for testing only, not for building)
            • I have not seen the exception fly when there is no workspace left from the previous run
            • When the exception is thrown, the sources on the affected machines have not been updated to the latest scm version

            For the moment I am working around this by sleeping something between 500 and 1500 milliseconds before retrying the scm checkout step.

            Show
            vgimple Volker Gimple added a comment - After doing some more testing with more finely grained and selective try/catch blocks I can add the following details: If it does occur, then the IOException2 always occurs during the scm checkout call in BuildAndTestStep() I cannot confirm my earlier impression that the exception is more likely to happen for one of the ARM platforms (in fact the scm step is usually not executed on one of those as these are typically used for testing only, not for building) I have not seen the exception fly when there is no workspace left from the previous run When the exception is thrown, the sources on the affected machines have not been updated to the latest scm version For the moment I am working around this by sleeping something between 500 and 1500 milliseconds before retrying the scm checkout step.
            Hide
            tom_ghyselinck Tom Ghyselinck added a comment -

            Hi all,

            I had a similar issue when running some (declarative) pipeline steps in parallel:

            org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml; lineNumber: 1; columnNumber: 537; Content is not allowed in trailing section.
            	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
            	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
            	at org.apache.commons.digester.Digester.parse(Digester.java:1871)
            	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76)
            Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml
            	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80)
            	at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1045)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143)
            	at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1253)
            	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136)
            	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:90)
            	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:77)
            	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50)
            	at hudson.security.ACL.impersonate(ACL.java:290)
            	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            	at java.lang.Thread.run(Thread.java:748)

            The created changelog2.xml was:

            <?xml version="1.0" encoding="UTF-8"?><log><logentry revision="3931"><author>tom.g</author><date>2018-10-03T10:29:53.640919Z</date><paths><path action="M" localPath="src/xra31rpc/xra31rpc.thrift" kind="file">/software/branches/1.x-sc_qam/rpc/src/xra31rpc/xra31rpc.thrift</path><path action="M" localPath="src/samplecode/test_cpp_client.cpp" kind="file">/software/branches/1.x-sc_qam/rpc/src/samplecode/test_cpp_client.cpp</path></paths><msg>The `inputLevel` is not yet available for the `PhysicalConnectorStatus`.</msg></logentry></log>ware/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.cpp</path><path action="M" localPath="src/xra31ctrl/PcapFileSaver.h" kind="file">/software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.h</path></paths><msg>- Remove packet filtering (i.e. `PacketFilter`) support in `PcapFileSaver`.
            - Use correct thread protection by using `std::atomic` where applicable.
            - Disable storing unused (member) variables.
            - Code cleanup.</msg></logentry><logentry revision="3930"><author>tom.g</author><date>2018-10-03T09:41:08.941382Z</date><paths><path action="M" localPath="src/xra31ctrl/DsSynchronizer.cpp" kind="file">/software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/DsSynchronizer.cpp</path></paths><msg>- Use C++ interface for `stdio`.</msg></logentry></log>

            The localPath in this changelog file are part of different checkouts (from the same commit in the same repository though), but still I suspect it has something to do with the parallel execution...

            I hope this helps!

            With best regards,
            Tom.

            Show
            tom_ghyselinck Tom Ghyselinck added a comment - Hi all, I had a similar issue when running some ( declarative ) pipeline steps in parallel : org.xml.sax.SAXParseException; systemId: file:/var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml; lineNumber: 1; columnNumber: 537; Content is not allowed in trailing section. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643) at org.apache.commons.digester.Digester.parse(Digester.java:1871) at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:76) Caused: hudson.util.IOException2: Failed to parse /var/lib/jenkins/jobs/XRA31_components-software/branches/branches-1-x-sc-qam.9mj31f/builds/468/changelog2.xml at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:80) at hudson.scm.SubversionChangeLogParser.parse(SubversionChangeLogParser.java:43) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onCheckout(WorkflowRun.java:1045) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1400(WorkflowRun.java:143) at org.jenkinsci.plugins.workflow.job.WorkflowRun$SCMListenerImpl.onCheckout(WorkflowRun.java:1253) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:136) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:90) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:77) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) The created changelog2.xml was: <?xml version= "1.0" encoding= "UTF-8" ?> <log> <logentry revision= "3931" > <author> tom.g </author> <date> 2018-10-03T10:29:53.640919Z </date> <paths> <path action= "M" localPath= "src/xra31rpc/xra31rpc.thrift" kind= "file" > /software/branches/1.x-sc_qam/rpc/src/xra31rpc/xra31rpc.thrift </path> <path action= "M" localPath= "src/samplecode/test_cpp_client.cpp" kind= "file" > /software/branches/1.x-sc_qam/rpc/src/samplecode/test_cpp_client.cpp </path> </paths> <msg> The `inputLevel` is not yet available for the `PhysicalConnectorStatus`. </msg> </logentry> </log> ware/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.cpp </path> <path action= "M" localPath= "src/xra31ctrl/PcapFileSaver.h" kind= "file" > /software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/PcapFileSaver.h </path> </paths> <msg> - Remove packet filtering (i.e. `PacketFilter`) support in `PcapFileSaver`. - Use correct thread protection by using `std::atomic` where applicable. - Disable storing unused (member) variables. - Code cleanup. </msg> </logentry> <logentry revision= "3930" > <author> tom.g </author> <date> 2018-10-03T09:41:08.941382Z </date> <paths> <path action= "M" localPath= "src/xra31ctrl/DsSynchronizer.cpp" kind= "file" > /software/branches/1.x-sc_qam/libxra31ctrl/src/xra31ctrl/DsSynchronizer.cpp </path> </paths> <msg> - Use C++ interface for `stdio`. </msg> </logentry> </log> The localPath in this changelog file are part of different checkouts ( from the same commit in the same repository though ), but still I suspect it has something to do with the parallel execution... I hope this helps! With best regards, Tom.
            Hide
            p4paul Paul Allen added a comment - - edited

            Sorry updating the wrong job.

            Internally we have the same problem with the changelog files being read before the writer has finished. For the moment we work around this issue in the `p4-plugin` by catching the exception, logging the error and returning an empty ChangeLogSet.

            Show
            p4paul Paul Allen added a comment - - edited Sorry updating the wrong job. Internally we have the same problem with the changelog files being read before the writer has finished. For the moment we work around this issue in the `p4-plugin` by catching the exception, logging the error and returning an empty ChangeLogSet.
            Hide
            duael Emmanuel Ruhnau added a comment -

            I had this error before when doing parallel checkouts.

            I removed parallel and I now do sequential checkouts. It's working fine since several months.

            Show
            duael Emmanuel Ruhnau added a comment - I had this error before when doing parallel checkouts. I removed parallel and I now do sequential checkouts. It's working fine since several months.
            Hide
            duael Emmanuel Ruhnau added a comment -
            Show
            duael Emmanuel Ruhnau added a comment - Seems to be fixed by  https://issues.jenkins-ci.org/browse/JENKINS-34313

              People

              • Assignee:
                vaimr Denis Saponenko
                Reporter:
                vgimple Volker Gimple
              • Votes:
                3 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: