Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39307

pipeline docker execution aborts without reason

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I'm trying to compile ArangoDB in one of these docker containers:
      https://github.com/arangodb-helper/build-docker-containers/tree/master/distros
      It works flawless for most of the containers, however I've seen it abort without any reason for docker containers derived of these two base contaires:

      fedora:23 ubuntu:12.04

      The output in the webinterface looks like that:

      [ 24%] Building CXX object lib/CMakeFiles/arango.dir/ApplicationFeatures/ApplicationServer.cpp.o
        CXX(target) /var/lib/jenkins/workspace/ArangoDB_Release/build-EPpackage-ubuntutwelveofour/3rdParty/V8/v8/x64.release/obj.target/icui18n/third_party/icu/source/i18n/fpositer.o
        CXX(target) /var/lib/jenkins/workspace/ArangoDB_Release/build-EPpackage-ubuntutwelveofour/3rdParty/V8/v8/x64.release/obj.target/icui18n/third_party/icu/source/i18n/funcrepl.o
        CXX(target) /var/lib/jenkins/workspace/ArangoDB_Release/build-EPpackage-ubuntutwelveofour/3rdParty/V8/v8/x64.release/obj.target/icui18n/third_party/icu/source/i18n/gender.o
      
        CXX(target) /var/lib/jenkins/workspace/ArangoDB_Release/build-EPpackage-ubuntutwelveofour/3rdParty/V8/v8/x64.release/obj.target/icui18n/third_party/icu/source/i18n/gregocal.o
      [ 24%] Building CXX object 3rdParty/rocksdb/rocksdb/CMakeFiles/rocksdblib.dir/db/db_iter.cc.o
        CXX(target) /var/lib/jenkins/workspace/ArangoDB_Release/build-EPpackage-ubuntutwelveofour/3rdParty/V8/v8/x64.release/obj.target/icui18n/third_party/icu/source/i18n/gregoimp.o
      [Pipeline] stage
      [Pipeline] { (Send Notification for failed build)
      [Pipeline] sh
      [ArangoDB_Release] Running shell script
      + git --no-pager show -s --format=%ae
      [Pipeline] mail
      
      [Pipeline] }
      [Pipeline] // stage
      
      [Pipeline] }
      $ docker stop --time=1 e0c5a42869989172c87fd272a714980602d7ec6c6b1be4655589b23f88b54760
      $ docker rm -f e0c5a42869989172c87fd272a714980602d7ec6c6b1be4655589b23f88b54760
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // withDockerRegistry
      [Pipeline] }
      [Pipeline] // withEnv
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] stage (Send Notification for build)
      Using the ‘stage’ step without a block argument is deprecated
      Entering stage Send Notification for build
      Proceeding
      [Pipeline] mail
      

      In the Ubuntu 12 container. In the Fedora container, it barely gets to start running the configure part of cmake and aborts that in a similar manner without any particular reason.

      When running the container on an interactive terminal session, the whole build goes through without any issues.

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Oh and no it should not be ps -o pid=7. ps -o pid= 7 is intentional.

            Show
            jglick Jesse Glick added a comment - Oh and no it should not be ps -o pid=7 . ps -o pid= 7 is intentional.
            Hide
            dothebart Wilfried Goesgens added a comment -

            As discussed on IRC so its not lost - stat'ing the /proc/<pid> directory should be as portable and reduce the need for the ps command. It should also be faster, since  [ -d /proc/<pid>]

            can be done as shell internal and doesn't need to fork a ps command.

             

            Another cure would be to add a counter to the ps calls, and if that was just one attempt, and docker shut down finds running processes in this container, add a warning to the error message - whether ps is available in the container or not.

            Show
            dothebart Wilfried Goesgens added a comment - As discussed on IRC so its not lost - stat'ing the /proc/<pid> directory should be as portable and reduce the need for the ps command. It should also be faster, since  [ -d /proc/<pid>] can be done as shell internal and doesn't need to fork a ps command.   Another cure would be to add a counter to the ps calls, and if that was just one attempt, and docker shut down finds running processes in this container, add a warning to the error message - whether ps is available in the container or not.
            Hide
            kufi Patrick Kaufmann added a comment -

            Ok, after some (2 days) of poking around in the code I've probably found a fix for this problem. The culprit seems to be the durable-task-plugin and not the docker-workflow-plugin.

            Basically what happens is, that the "ps" command which determines if an "sh" command is still running gets run on the wrong docker host if the docker host is anything else than localhost. This make the durable-task-plugin believe that the script terminated unexpectedly, which in turn aborts the whole script.

            The fix adds the same env vars to the ps command as were run on the sh command. This then runs the commands on the same host.

            The corresponding merge request is here: https://github.com/jenkinsci/durable-task-plugin/pull/40

            Show
            kufi Patrick Kaufmann added a comment - Ok, after some (2 days) of poking around in the code I've probably found a fix for this problem. The culprit seems to be the durable-task-plugin and not the docker-workflow-plugin. Basically what happens is, that the "ps" command which determines if an "sh" command is still running gets run on the wrong docker host if the docker host is anything else than localhost. This make the durable-task-plugin believe that the script terminated unexpectedly, which in turn aborts the whole script. The fix adds the same env vars to the ps command as were run on the sh command. This then runs the commands on the same host. The corresponding merge request is here: https://github.com/jenkinsci/durable-task-plugin/pull/40
            Hide
            dothebart Wilfried Goesgens added a comment -

            Patrick Kaufmann since my original problem was, that there was no ps command inside of that docker container I don't think your problem is the same bug.

            As suggested above using ps should be avoided alltogether so the dependency can be removed.

            Show
            dothebart Wilfried Goesgens added a comment - Patrick Kaufmann since my original problem was, that there was no ps command inside of that docker container I don't think your problem is the same bug. As suggested above using ps should be avoided alltogether so the dependency can be removed.
            Hide
            jglick Jesse Glick added a comment -

            Possibly solved by JENKINS-47791, not sure.

            Show
            jglick Jesse Glick added a comment - Possibly solved by  JENKINS-47791 , not sure.

              People

              • Assignee:
                kufi Patrick Kaufmann
                Reporter:
                dothebart Wilfried Goesgens
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: