Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-32859

tests hung in docker container due to PID1 not reaping zombies

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • docker-workflow-plugin
    • None

      Description:
      I have initially encountered this problem when trying to run 'make check' for our software inside a docker container created by the Docker Pipeline plugin, which hanged forever:
      http://gitweb.skylable.com/gitweb/?p=sx.git;a=blob;f=Jenkinsfile;h=93a60925efe2fff0e5428e987ec3d24e593b76e9;hb=refs/heads/ci

      I have created a minimal testcase for this bugreport's purpose that consists only of a Groovy script without external dependencies, see attached config.xml, and tracked down the problem to this line in docker-workflow plugin:
      https://github.com/jenkinsci/docker-workflow-plugin/blob/566738205795b939b72d337557fa3514c141295a/src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java#L138

      Please provide a way to override the 'cat' command in the Groovy DSL, or update the workflow plugin to avoid the zombie issue.

      Steps to reproduce:
      1. use attached config.xml to create a new Pipeline project (I called it docker-zombies)
      2. press build now
      3. watch console output of the build

      Expected results:
      job finishes

      Actual results:
      job runs forever

      Additional information:
      The job waits for a process to finish gracefully by checking if the PID is still alive, using kill -0. (since the process is not a direct child it cannot use the 'wait' ).
      The PID has exited, and becomes a zombie process (<defunct>) that is supposed to be reaped by PID 1. However because PID 1 (the first process started in the docker container) is 'cat', which doesn't know how to reap children, the zombie processes stays around forever.

      This is a well known problem with Docker: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/, and the usual solution is to use a shell such as bash to run as PID 1 (and run any commands you want like cat as children).

      Here is the console output when I run the attached job:
      In progressConsole Output

      Started by user Admin
      [Pipeline] Allocate node : Start
      Running on master in /var/jenkins_home/workspace/docker-zombies
      [Pipeline] node {
      [Pipeline] sh
      [docker-zombies] Running shell script
      + docker inspect -f . buildpack-deps:latest
      .
      [Pipeline] Run build steps inside a Docker container : Start
      $ docker run -t -d -u 1000:1000 -w /var/jenkins_home/workspace/docker-zombies -v /var/jenkins_home/workspace/docker-zombies:/var/jenkins_home/workspace/docker-zombies:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** buildpack-deps:latest cat
      [Pipeline] withDockerContainer {
      [Pipeline] writeFile
      [Pipeline] sh

      [docker-zombies] Running shell script
      + exec /usr/bin/python test.py

      [Pipeline] sh

      [docker-zombies] Running shell script
      + ps -ef
      UID PID PPID C STIME TTY TIME CMD
      1000 1 0 0 16:02 ? 00:00:00 cat
      1000 11 1 0 16:02 ? 00:00:00 [python] <defunct>
      1000 12 1 0 16:02 ? 00:00:00 [python] <defunct>
      1000 45 0 0 16:02 ? 00:00:00 sh -c echo $$ > '/var/jenkins_home/workspace/docker-zombies/.jenkins-3a360d3d/pid'; jsc=durable-0b3272bed0cac7d970b36ad24e8c046c; JENKINS_SERVER_COOKIE=$jsc '/var/jenkins_home/workspace/docker-zombies/.jenkins-3a360d3d/script.sh' > '/var/jenkins_home/workspace/docker-zombies/.jenkins-3a360d3d/jenkins-log.txt' 2>&1; echo $? > '/var/jenkins_home/workspace/docker-zombies/.jenkins-3a360d3d/jenkins-result.txt'
      1000 49 45 0 16:02 ? 00:00:00 /bin/sh -xe /var/jenkins_home/workspace/docker-zombies/.jenkins-3a360d3d/script.sh
      1000 50 49 0 16:02 ? 00:00:00 ps -ef
      [Pipeline] sh
      [docker-zombies] Running shell script

      + cat pidfile
      + PID=11
      + kill -0 11
      + echo Waiting for 11 to exit
      Waiting for 11 to exit
      + sleep 1

      + kill -0 11
      + echo Waiting for 11 to exit
      Waiting for 11 to exit
      + sleep 1
      + kill -0 11
      + echo Waiting for 11 to exit
      Waiting for 11 to exit
      + sleep 1

      + kill -0 11
      + echo Waiting for 11 to exit
      Waiting for 11 to exit
      + sleep 1
      + kill -0 11
      + echo Waiting for 11 to exit
      Waiting for 11 to exit
      + sleep 1

            Unassigned Unassigned
            edwintorok Edwin Török
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: