Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59668

Run wrapper process in the background fails with the latest changes

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Component/s: durable-task-plugin
    • Labels:
    • Environment:
      Jenkins 2.197
      durable-task 1.30
      Docker version 19.03.1, build 74b1e89e8a
    • Similar Issues:

      Description

      Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

       

      [2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
      [2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      script returned exit code -2  

       

      Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

      https://github.com/elastic/apm-agent-nodejs/pull/1393

      Apparently it happens when running a docker inside of a worker.

       

      Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around

       

      Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

       

       

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            I presume the issue is trying to run sh on Windows and it not existing. BTW the full Pipeline script does not seem to make much sense: you are grabbing a Linux node, then holding an executor lock while also grabbing a Windows node?

            Show
            jglick Jesse Glick added a comment - I presume the issue is trying to run sh on Windows and it not existing. BTW the full Pipeline script does not seem to make much sense: you are grabbing a Linux node, then holding an executor lock while also grabbing a Windows node?
            Hide
            jglick Jesse Glick added a comment -

            Sometimes this results from script mistakes like

            withEnv(["PATH=${tool 'whatever'}/bin"]) {
              sh 'whatever'
            }
            

            when what was meant was

            withEnv(["PATH+WHATEVER=${tool 'whatever'}/bin"]) {
              sh 'whatever'
            }
            
            Show
            jglick Jesse Glick added a comment - Sometimes this results from script mistakes like withEnv([ "PATH=${tool 'whatever' }/bin" ]) { sh 'whatever' } when what was meant was withEnv([ "PATH+WHATEVER=${tool 'whatever' }/bin" ]) { sh 'whatever' }
            Hide
            v2v Victor Martinez added a comment -

            Much appreciated for the debugging and answers. Although I've found the current pipeline might be a bit too complicated, but that's likely another story.

             

            I was able to narrow down the issue a bit more based on your feedback, for some reason the PATH environment variable get corrupted, so the below snippet produces

             

            def forLinux() (
              return {
                node('linux'){ 
                  try {
                    sh label: 'Pre-Environment', script: 'env | sort'   // only for debugging purposes
                    deleteDir()
                    unstash 'source'
                    retry(2){
                      sleep  23
                      sh(label: "Run Tests"...
                    }
                    sh label: 'Post-Environment', script: 'env | sort'  // only for debugging purposes
                  } catch(e){ 
                    error(e.toString())
                  } finally {
                    sh label: 'Environment', script: 'env | sort'  // only for debugging purposes
                    ...
                  }
                }
              }
            }
            

             

             

             

            I just managed to add the withEnv to ensure the PATH is defined, although not sure how it gets corrupted though, as that particular behaviour does not happen when the above snippet in a parallel step without any other references but calls to the forLinux

             

            For instance, the below snippet shows how the parallel is configured for the issue with the nohup and a similar one which does not fail with the nohup.

             

            stage('FailedWithNoHup') {
              environment {
                HOME = "${env.WORKSPACE}"
              }
              steps {
                def parallelTasks = [:]
                parallelTasks["Linux-1"] = forLinux(version: 1) 
                parallelTasks["Linux-2"] = forLinux(version: 2)
                parallelTasks["Windows-1"] = forWindows(version: 1)
              }
            }
            stage('Works') {
              environment {
                HOME = "${env.WORKSPACE}"
              }
              steps {
                def parallelTasks = [:]
                parallelTasks["Linux-1"] = forLinux(version: 1) 
                parallelTasks["Linux-2"] = forLinux(version: 2)
                parallel(parallelTasks)
              }
            }
            def forLinux(Map params = [:]){
              def version = params?.version
              return {
                node('linux'){ 
                  try {
                    sh label: 'Pre-Environment', script: 'env | sort'   // only for debugging purposes
                    deleteDir()
                    unstash 'source'
                    retry(2){
                      sleep  23
                      sh(label: "Run Tests"...
                    }
                    sh label: 'Post-Environment', script: 'env | sort'  // only for debugging purposes
                  } catch(e){ 
                    error(e.toString())
                  } finally {
                    sh label: 'Environment', script: 'env | sort'  // only for debugging purposes
                    ...
                  }
                }
              }
            }
            def forWindows(Map params = [:]){
              return {
                node('windows'){
                  ...
                }
              }
            }
            
            
             

             

            Show
            v2v Victor Martinez added a comment - Much appreciated for the debugging and answers. Although I've found the current pipeline might be a bit too complicated, but that's likely another story.   I was able to narrow down the issue a bit more based on your feedback, for some reason the PATH environment variable get corrupted, so the below snippet produces   def forLinux() ( return { node( 'linux' ){   try {     sh label: 'Pre-Environment' , script: 'env | sort' // only for debugging purposes deleteDir() unstash 'source' retry(2){ sleep 23     sh(label: "Run Tests" ... }     sh label: 'Post-Environment' , script: 'env | sort' // only for debugging purposes   } catch (e){     error(e.toString())   } finally {     sh label: 'Environment' , script: 'env | sort' // only for debugging purposes ...   } } } }       I just managed to add the withEnv to ensure the PATH is defined, although not sure how it gets corrupted though, as that particular behaviour does not happen when the above snippet in a parallel step without any other references but calls to the forLinux   For instance, the below snippet shows how the parallel is configured for the issue with the nohup and a similar one which does not fail with the nohup.   stage( 'FailedWithNoHup' ) { environment { HOME = "${env.WORKSPACE}" } steps { def parallelTasks = [:] parallelTasks[ "Linux-1" ] = forLinux(version: 1)    parallelTasks[ "Linux-2" ] = forLinux(version: 2)    parallelTasks[ "Windows-1" ] = forWindows(version: 1)   } } stage( 'Works' ) { environment { HOME = "${env.WORKSPACE}" } steps { def parallelTasks = [:] parallelTasks[ "Linux-1" ] = forLinux(version: 1)    parallelTasks[ "Linux-2" ] = forLinux(version: 2) parallel(parallelTasks)   } } def forLinux(Map params = [:]){ def version = params?.version return { node( 'linux' ){   try {     sh label: 'Pre-Environment' , script: 'env | sort' // only for debugging purposes deleteDir() unstash 'source' retry(2){ sleep 23     sh(label: "Run Tests" ... }     sh label: 'Post-Environment' , script: 'env | sort' // only for debugging purposes   } catch (e){     error(e.toString())   } finally {     sh label: 'Environment' , script: 'env | sort' // only for debugging purposes ...   } } } } def forWindows(Map params = [:]){ return { node( 'windows' ){ ... } } }  
            Hide
            jglick Jesse Glick added a comment -

            Using

            sh 'env | sort'
            

            is obviously not going to help diagnosis if the problem is a broken sh step! You can try

            echo "PATH set to $PATH"
            

            which is not quite as convincing but may pinpoint your issue: either something in your build or Jenkins node configuration corrupting $PATH, or an accidental usage of sh inside a Windows node (without a properly configured Cygwin environment or whatever).

            Can probably be closed as this looks like a user error, not related to recent changes except to the extent that for technical reasons we had to turn off LAUNCH_DIAGNOSTICS by default to avoid wasting system resources in the normal case that $PATH is OK. Maybe it could be automatically enabled for the rest of the JVM session upon encountering one of these errors, or something like that.

            Show
            jglick Jesse Glick added a comment - Using sh 'env | sort' is obviously not going to help diagnosis if the problem is a broken sh step! You can try echo "PATH set to $PATH" which is not quite as convincing but may pinpoint your issue: either something in your build or Jenkins node configuration corrupting $PATH , or an accidental usage of sh inside a Windows node (without a properly configured Cygwin environment or whatever). Can probably be closed as this looks like a user error, not related to recent changes except to the extent that for technical reasons we had to turn off LAUNCH_DIAGNOSTICS by default to avoid wasting system resources in the normal case that $PATH is OK. Maybe it could be automatically enabled for the rest of the JVM session upon encountering one of these errors, or something like that.
            Hide
            v2v Victor Martinez added a comment -

            Thanks again for the help, I finally managed to find the issue related to the $PATH, which it was related to a manipulation of the env map, the env seems to be a global map therefore when running a parallel step using different OS its behavior might be unpredictable.

             

            Thanks again and thanks for caring about LAUNCH_DIAGNOSTICS to be the default approach. 

             

            I'll close this ticket now if you don't mind

            Show
            v2v Victor Martinez added a comment - Thanks again for the help, I finally managed to find the issue related to the $PATH, which it was related to a manipulation of the env map, the env seems to be a global map therefore when running a parallel step using different OS its behavior might be unpredictable.   Thanks again and thanks for caring about LAUNCH_DIAGNOSTICS to be the default approach.    I'll close this ticket now if you don't mind

              People

              • Assignee:
                Unassigned
                Reporter:
                v2v Victor Martinez
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: