Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55308

intermittent "terminated" messages using sh in Pipelines

    Details

    • Type: Bug
    • Status: In Progress (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      Testing Jenkins 2.138.2 LTS, Jenkins pipelines that use sh intermittently throw the following message in the console log …

       

      sh: line 1:  4449 Terminated              sleep 3

       

       … and sometimes this … 

       

      sh: line 1: 13136 Terminated              { while [ ( -d /proc/$pid -o ! -d /proc/$$ ) -a -d '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0' -a ! -f '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-result.txt' ]; do    touch '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-log.txt'; sleep 3;{{done; }}}

       

      Jenkins master runs from a Docker image based on jenkins/jenkins:2.138.2-alpine with specific plugins baked into the image by /usr/local/bin/install-plugins.sh

      The message originates in durable-task-plugin, which must be a dependency of one of the plugins.txt plugins.  

      Two important observations:

      1) The issue does not occur when starting with the base jenkins/jenkins:2.138.2-alpine image and manually installing plugins via UI. That might suggest the issue is around how install-plugins.sh installs plugins and/or dependencies. 

      2) The issue does not occur on our production image, which is also 2.138.2-alpine + plugins built 2018-10-11. Rebuilding the the same image from the same Dockerfile results in different installed plugins. Makes me think results using install-plugins.sh are not deterministic.

        Attachments

          Issue Links

            Activity

            gc875 gc875 created issue -
            Hide
            jl68 Jonas Lindström added a comment - - edited

            We run the jenkins-alpine docker image and we also get these messages. I tried switching to the Debian image (2.150.2) and the intermittent "terminated" message are still there, so I don't think it's specifically related to the OS.

            This seems to have been introduced after v1.22 of durable-task-plugin. I guess the error is lurking somewhere in this line in BourneShellScript.java:

                        cmd = String.format("pid=$$; { while [ \\( -d /proc/$pid -o \\! -d /proc/$$ \\) -a -d '%s' -a \\! -f '%s' ]; do touch '%s'; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2> '%s'; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait", 
            ... more args ...

            It's difficult to downgrade to 1.22 since so many other plugins depend on later versions of this plugin.

            Show
            jl68 Jonas Lindström added a comment - - edited We run the jenkins-alpine docker image and we also get these messages. I tried switching to the Debian image (2.150.2) and the intermittent "terminated" message are still there, so I don't think it's specifically related to the OS. This seems to have been introduced after v1.22 of durable-task-plugin. I guess the error is lurking somewhere in this line in BourneShellScript.java: cmd = String .format( "pid=$$; { while [ \\( -d /proc/$pid -o \\! -d /proc/$$ \\) -a -d '%s' -a \\! -f '%s' ]; do touch '%s' ; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2> '%s' ; echo $? > '%s.tmp' ; mv '%s.tmp' '%s' ; wait" , ... more args ... It's difficult to downgrade to 1.22 since so many other plugins depend on later versions of this plugin.
            jguigui Jean-Paul G made changes -
            Field Original Value New Value
            Component/s docker [ 20834 ]
            Hide
            jguigui Jean-Paul G added a comment - - edited

            Not specific to docker.

            Have the very same issue when running sh step with parallel, on ssh slave with Jenkins 2.156

            Show
            jguigui Jean-Paul G added a comment - - edited Not specific to docker. Have the very same issue when running sh step with parallel, on ssh slave with Jenkins 2.156
            Hide
            avonnieda Adam vonNieda added a comment -

            I've got the same issue when running sh step without parallel, on ssh slave with Jenkins 2.150.2 LTS, I'm not running on docker.

            Show
            avonnieda Adam vonNieda added a comment - I've got the same issue when running sh step without parallel, on ssh slave with Jenkins 2.150.2 LTS, I'm not running on docker.
            Hide
            cyu4322 Connor Yu added a comment -

            I am seeing this as well. Upgraded to 2.46.3 yesterday and started seeing this in my simple pipeline. 

            sh: line 1: 27323 Terminated              sleep 3
            

             

            Show
            cyu4322 Connor Yu added a comment - I am seeing this as well. Upgraded to 2.46.3 yesterday and started seeing this in my simple pipeline.  sh: line 1: 27323 Terminated sleep 3  
            Hide
            karlparry Karl Parry added a comment -

            I am seeing this as well. We have just seen it randomly. We do get it when we run a custom clean up script that deletes the @tmp/@script/@script@tmp folders 

            sh: line 1: 27323 Terminated              sleep 3
            Show
            karlparry Karl Parry added a comment - I am seeing this as well. We have just seen it randomly. We do get it when we run a custom clean up script that deletes the @tmp/@script/@script@tmp folders  sh: line 1: 27323 Terminated sleep 3
            Hide
            mhall4 Matthew Hall added a comment -

            Also seeing this. Jenkins 2.150.3 and durable-task 1.29, which was an upgrade from 1.25.

            Show
            mhall4 Matthew Hall added a comment - Also seeing this. Jenkins 2.150.3 and durable-task 1.29, which was an upgrade from 1.25.
            Hide
            mkozell Mike Kozell added a comment -

            I upgraded Pipeline: Job (workflow-job) plugin from 2.25 to 2.31 and now see the issue.  I'm running durable-task 1.27.

            Show
            mkozell Mike Kozell added a comment - I upgraded Pipeline: Job (workflow-job) plugin from 2.25 to 2.31 and now see the issue.  I'm running durable-task 1.27.
            dnusbaum Devin Nusbaum made changes -
            Assignee Carlos Sanchez [ csanchez ] Devin Nusbaum [ dnusbaum ]
            Hide
            dnusbaum Devin Nusbaum added a comment -

            The sleep 3 is part of the execution of a shell step. A background process touches a specific file on the agent every 3 seconds, and the Jenkins master checks the timestamp on that file as a proxy to know whether the script is still running or not. It seems based on the reports here that something is causing that process to be killed on some systems, but I don't have any ideas about what it could be offhand.

            Does anyone have a Pipeline that can reproduce the issue consistently? If so, please add it here (simplifying as much as possible0 and specify the agent OS and sh implementation (e.g., dash, bash) for the agents running the sh steps.

            Show
            dnusbaum Devin Nusbaum added a comment - The sleep 3 is part of the execution of a shell step. A background process touches a specific file on the agent every 3 seconds, and the Jenkins master checks the timestamp on that file as a proxy to know whether the script is still running or not. It seems based on the reports here that something is causing that process to be killed on some systems, but I don't have any ideas about what it could be offhand. Does anyone have a Pipeline that can reproduce the issue consistently? If so, please add it here (simplifying as much as possible0 and specify the agent OS and sh implementation (e.g., dash, bash) for the agents running the sh steps.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Natalia Natalia Thanks for posting your Pipeline! Here are some further questions to try to help narrow down what could be happening:

            • Just to confirm, your agents are running OS X?
            • You say you see the issue constantly, but can you be more specific (do you see the issue every other build, every 10th build, ...)?
            • Approximately how many parallel tasks are you running at once (10s, 100s, ...)?
            • How resource-intensive is your shell script (is it possibly triggering memory or CPU limits that would cause the Kernel to start killing processes)?
            • How are the agents connected to the Jenkins master (SSH, JNLP), and are they static machines or are they dynamically provisioned by the Jenkins master through something like the Docker Swarm or EC2 plugins?
            • If they are static machines, are they running anything else or are they only used as Jenkins agents? Assuming they are static, I would see if there are any particular agents where the shell script seems to be getting terminated consistently based on previous failed, and log onto those machines and check general system logs and running processes to see if there is anything that looks like it might be responsible for killing the process.
            Show
            dnusbaum Devin Nusbaum added a comment - Natalia Natalia Thanks for posting your Pipeline! Here are some further questions to try to help narrow down what could be happening: Just to confirm, your agents are running OS X? You say you see the issue constantly, but can you be more specific (do you see the issue every other build, every 10th build, ...)? Approximately how many parallel tasks are you running at once (10s, 100s, ...)? How resource-intensive is your shell script (is it possibly triggering memory or CPU limits that would cause the Kernel to start killing processes)? How are the agents connected to the Jenkins master (SSH, JNLP), and are they static machines or are they dynamically provisioned by the Jenkins master through something like the Docker Swarm or EC2 plugins? If they are static machines, are they running anything else or are they only used as Jenkins agents? Assuming they are static, I would see if there are any particular agents where the shell script seems to be getting terminated consistently based on previous failed, and log onto those machines and check general system logs and running processes to see if there is anything that looks like it might be responsible for killing the process.
            Hide
            mayqueen Natalia Natalia added a comment - - edited

            Devin Nusbaum

            • My agents are Apple nodes, MacMinis
            • every run
            • I have around 20 nodes gathered in one label and it's just one same task for all of the nodes
            • shell code exactly like the above - keeping only sh """ """" produces the error in every run; so I would say it doesn't matter at all
            • agents are connected through ssh, they are static physical machines in the racks
            • they are only used as Jenkins agents

            I'm not sure if you are getting the problem - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. It started when we upgraded core and plugins to the newest version.

            EDIT: I also have other nodes provisioned in the cloud (manually), connected through ssh, just tested on these and it's the same issue.

            # cat /etc/redhat-release
            CentOS Linux release 7.6.1810 (Core)
            $ /bin/bash --version
            GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
            Copyright (C) 2011 Free Software Foundation, Inc.
            License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
            
            This is free software; you are free to change and redistribute it.
            There is NO WARRANTY, to the extent permitted by law.
            
            Show
            mayqueen Natalia Natalia added a comment - - edited Devin Nusbaum My agents are Apple nodes, MacMinis every run I have around 20 nodes gathered in one label and it's just one same task for all of the nodes shell code exactly like the above - keeping only sh """ """" produces the error in every run; so I would say it doesn't matter at all agents are connected through ssh, they are static physical machines in the racks they are only used as Jenkins agents I'm not sure if you are getting the problem - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. It started when we upgraded core and plugins to the newest version. EDIT: I also have other nodes provisioned in the cloud (manually), connected through ssh, just tested on these and it's the same issue. # cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) $ /bin/bash --version GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http: //gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log.

            Thanks for clarifying, I didn't realize that things were working apart from the log message and that you were able to reproduce the issue by literally running a blank sh step. I'll try to create a unit test that can reproduce the problem based on your info when I have some time.

            Show
            dnusbaum Devin Nusbaum added a comment - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. Thanks for clarifying, I didn't realize that things were working apart from the log message and that you were able to reproduce the issue by literally running a blank sh step. I'll try to create a unit test that can reproduce the problem based on your info when I have some time.
            Hide
            wl2776 Vladimir Eremeev added a comment -

            I see this error on Jenkins 2.164.1

            One my pipeline job started failing today with unclear reason. 

            Both successful jobs and failed jobs contain "Terminated"  in console logs

            Show
            wl2776 Vladimir Eremeev added a comment - I see this error on Jenkins 2.164.1 One my pipeline job started failing today with unclear reason.  Both successful jobs and failed jobs contain "Terminated"  in console logs
            danielzabel Daniel Zabel made changes -
            Link This issue is caused by JENKINS-55867 [ JENKINS-55867 ]
            Hide
            vennila Vennila Perumal added a comment -

            hi,

            I am also having the same problem where the console log is contains the "sh: line 1:  xxx Terminated              sleep 3" message unwantedly.

            May I know how to get rid of this ?

            Jenkins version :  2.150

             

            Show
            vennila Vennila Perumal added a comment - hi, I am also having the same problem where the console log is contains the " sh: line 1:  xxx Terminated              sleep 3" message unwantedly. May I know how to get rid of this ? Jenkins version :  2.150  
            ekdmdhn Dhanshree Kadam made changes -
            Assignee Devin Nusbaum [ dnusbaum ] Dhanshree Kadam [ ekdmdhn ]
            ekdmdhn Dhanshree Kadam made changes -
            Assignee Dhanshree Kadam [ ekdmdhn ] Devin Nusbaum [ dnusbaum ]
            Hide
            jl68 Jonas Lindström added a comment - - edited

            Mike Kozell was on to something: the bug is not just in the durable-task-plugin, although the symptoms come from there. It is introduced when you upgrade workflow-job. I have managed to pinpoint it down to a specific version.

            The following setup does not trigger the bug (relevant plugins listed). All are at latest version at the time of writing, except workflow-job:

            • durable-task 1.29
            • workflow-api 2.33
            • workflow-durable-task-step 2.30
            • workflow-step-api 2.19
            • workflow-support 3.3
            • workflow-job 2.25

            Upgrading workflow-job to 2.27 or later triggers the bug. (2.26 does not exist.)

            Show
            jl68 Jonas Lindström added a comment - - edited Mike Kozell was on to something: the bug is not just in the durable-task-plugin, although the symptoms come from there. It is introduced when you upgrade workflow-job. I have managed to pinpoint it down to a specific version. The following setup does not trigger the bug (relevant plugins listed). All are at latest version at the time of writing, except workflow-job: durable-task 1.29 workflow-api 2.33 workflow-durable-task-step 2.30 workflow-step-api 2.19 workflow-support 3.3 workflow-job 2.25 Upgrading workflow-job to 2.27 or later triggers the bug. (2.26 does not exist.)
            jl68 Jonas Lindström made changes -
            Component/s workflow-job-plugin [ 21716 ]
            Hide
            avi_kivity Avi Kivity added a comment -

            I am seeing the same problem, parallel jobs running on Linux agents.  I can share the pipeline code if it will help.

            Show
            avi_kivity Avi Kivity added a comment - I am seeing the same problem, parallel jobs running on Linux agents.  I can share the pipeline code if it will help.
            Hide
            epkaxma Martin Wiklundh added a comment -

            We see the same problem intermittently, in the afternoons only. Appeared without any known change in our Jenkins setup. Network/storage issues may produce the same error.

            Show
            epkaxma Martin Wiklundh added a comment - We see the same problem intermittently, in the afternoons only. Appeared without any known change in our Jenkins setup. Network/storage issues may produce the same error.
            Hide
            kierzo Kieron Kierzo added a comment -

            Hey,

            Had this issue on the my Mac build.

            My fix was to set the JAVA_HOME variable which was missing on the machine.

            I created and added this file on the mac : "~/.bash_profile"

            added this line....

            export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home

            rebooted the mac and its fixed it.

            your "jdk-11.0.2.jdk" folder may be different so just replace that with whatever is in there.

            if you run the command:

            echo ${JAVA_HOME}

             

            and its empty this could be the cause.

             

            Hope this helps others.

             

            Cheers Kieron.

             

             

            Show
            kierzo Kieron Kierzo added a comment - Hey, Had this issue on the my Mac build. My fix was to set the JAVA_HOME variable which was missing on the machine. I created and added this file on the mac : "~/.bash_profile" added this line.... export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home rebooted the mac and its fixed it. your "jdk-11.0.2.jdk" folder may be different so just replace that with whatever is in there. if you run the command: echo ${JAVA_HOME}   and its empty this could be the cause.   Hope this helps others.   Cheers Kieron.    
            Hide
            jsosic Jakov Sosic added a comment -

            It seems that adding $JAVA_HOME in Global properties indeed fixes this!

            Since I run CentOS 7, this is what I've added:

            Name: JAVA_HOME

            Value: /usr/lib/jvm/jre-1.8.0-openjdk

            Show
            jsosic Jakov Sosic added a comment - It seems that adding $JAVA_HOME in Global properties indeed fixes this! Since I run CentOS 7, this is what I've added: Name: JAVA_HOME Value: /usr/lib/jvm/jre-1.8.0-openjdk
            Hide
            ctran Cuong Tran added a comment -

            Was this property added to the agent or the master?

            Show
            ctran Cuong Tran added a comment - Was this property added to the agent or the master?
            Hide
            zdtsw Wen Zhou added a comment -

            if the bug is really introduced by the workflow-job plugin, can we expect a fix in the new release?

             

            Show
            zdtsw Wen Zhou added a comment - if the bug is really introduced by the workflow-job plugin, can we expect a fix in the new release?  
            Hide
            parogui Pablo Rodriguez added a comment - - edited

            The issue could be reproduced by running this simple Pipeline:

            Pipeline script
            pipeline {
             agent { label 'nodelinux' }
              stages {
               stage('build') {
                steps {
                 echo "Hello World!"
                 sh "echo Hello from the shell"
            }
            }
            }
            }
            

            Console output:

            [Pipeline] Start of Pipeline
            [Pipeline] node
            [Pipeline] {
            [Pipeline] stage
            [Pipeline] { (build)
            [Pipeline] echo
            Hello World!
            [Pipeline] sh
            + echo Hello from the shell
            Hello from the shell
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }
            [Pipeline] // node
            [Pipeline] End of Pipeline
            sh: line 1: 18285 Terminated sleep 3
            Finished: SUCCESS
            

            Environment:

            • CloudBees Jenkins Platform - Master 2.176.2.3

            • workflow-job:2.32 'Pipeline: Job'

            It started happening after workflow-job was updated from 2.25 to 2.32

            Show
            parogui Pablo Rodriguez added a comment - - edited The issue could be reproduced by running this simple Pipeline: Pipeline script pipeline { agent { label 'nodelinux' } stages { stage( 'build' ) { steps { echo "Hello World!" sh "echo Hello from the shell" } } } } Console output: [Pipeline] Start of Pipeline [Pipeline] node [Pipeline] { [Pipeline] stage [Pipeline] { (build) [Pipeline] echo Hello World! [Pipeline] sh + echo Hello from the shell Hello from the shell [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline sh: line 1: 18285 Terminated sleep 3 Finished: SUCCESS Environment: CloudBees Jenkins Platform - Master  2.176.2.3 workflow-job:2.32 'Pipeline: Job' It started happening after workflow-job was updated from 2.25 to 2.32
            Hide
            carroll Carroll Chiou added a comment - - edited

            The sleep 3 process was introduced when the heartbeat check feature was added in version 1.16 of durable-task, which translates to 2.18 of workflow-durable-task-step, which translates to 2.27 of workflow-job.

            Unless I missed a comment, it looks like the pipeline behaves as expected outside of "Terminated sleep 3" line.

            At least in Pablo Rodriguez’s example, this might be a race condition between the sleep 3 process and when the script completes. Like Devin Nusbaum mentioned earlier, the sleep 3 process is used to touch the output log file to show the script is still alive. However, when the script completes, it will write to a separate result file. A watcher service is checking for that result file every 100ms. Once that result file is found, results are transmitted and everything related to that specific step’s workspace is purged. It might be possible that the output file gets cleaned up right after the sleep 3 process checks if the file still exists, but before it gets touched again?

            There is a new release of durable-task (1.31) that removes the sleep 3 process so this line won't pop up anymore.

            Update: I have not been able to reproduce this issue, so I can't say for certain if this issue is resolved. Technically, it should, but it could be possible that the new version just changes the behavior of this bug.

            Show
            carroll Carroll Chiou added a comment - - edited The sleep 3 process was introduced when the heartbeat check feature was added in version 1.16 of durable-task, which translates to 2.18 of workflow-durable-task-step, which translates to 2.27 of workflow-job. Unless I missed a comment, it looks like the pipeline behaves as expected outside of "Terminated sleep 3" line. At least in Pablo Rodriguez ’s example , this might be a race condition between the sleep 3 process and when the script completes. Like Devin Nusbaum mentioned earlier , the sleep 3 process is used to touch the output log file to show the script is still alive. However, when the script completes, it will write to a separate result file. A watcher service is checking for that result file every 100ms. Once that result file is found, results are transmitted and everything related to that specific step’s workspace is purged. It might be possible that the output file gets cleaned up right after the sleep 3 process checks if the file still exists, but before it gets touched again? There is a new release of durable-task (1.31) that removes the sleep 3 process so this line won't pop up anymore. Update: I have not been able to reproduce this issue, so I can't say for certain if this issue is resolved. Technically, it should, but it could be possible that the new version just changes the behavior of this bug.
            carroll Carroll Chiou made changes -
            Assignee Devin Nusbaum [ dnusbaum ] Carroll Chiou [ carroll ]
            carroll Carroll Chiou made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            aal A Alvarez added a comment -

            Can confirm that after upgrading durable-task to the latest version, the Terminated messages are gone from our jobs.

            Show
            aal A Alvarez added a comment - Can confirm that after upgrading durable-task to the latest version, the Terminated messages are gone from our jobs.
            Hide
            carroll Carroll Chiou added a comment -

            A Alvarez I'm curious, if you are running 1.33, are you passing in the FORCE_BINARY_WRAPPER=true system property?

            Show
            carroll Carroll Chiou added a comment - A Alvarez I'm curious, if you are running 1.33, are you passing in the FORCE_BINARY_WRAPPER=true system property?
            Hide
            aal A Alvarez added a comment -

            Hi Carroll Chiou not as far as I'm aware although this could be a default or coming from another plugin. All we did was upgrading the plugin and the messages disappeared from the console output when running shell commands within Pipeline scripts

            Show
            aal A Alvarez added a comment - Hi Carroll Chiou not as far as I'm aware although this could be a default or coming from another plugin. All we did was upgrading the plugin and the messages disappeared from the console output when running shell commands within Pipeline scripts
            Hide
            carroll Carroll Chiou added a comment -

            A Alvarez thanks, that's good to know

            Show
            carroll Carroll Chiou added a comment - A Alvarez thanks, that's good to know

              People

              • Assignee:
                carroll Carroll Chiou
                Reporter:
                gc875 gc875
              • Votes:
                24 Vote for this issue
                Watchers:
                44 Start watching this issue

                Dates

                • Created:
                  Updated: