Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59903

durable-task v1.31 breaks sh steps in pipeline when running in a Docker container

    Details

    • Similar Issues:
    • Released As:
      1.33

      Description

      A pipeline like this:

      pipeline {
          agent {
              docker {
                  label 'docker'
                  image 'busybox'
              }
          }
          stages {
              stage("Test sh script in container") {
                  steps {
                    sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                  }
              }
          }
      }
      

      Fails with this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline (hide)
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      process apparently never started in /...
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      $ docker rm -f 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Adding the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter gives this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"/var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64\": stat /var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: no such file or directory": unknown
      process apparently never started in /...
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      $ docker rm -f 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Tested on three different Jenkins masters with similar, but no identical, configurations.

      Reverting to Durable Task Plugin v. 1.30 "solves" the problem.

        Attachments

          Issue Links

            Activity

            Hide
            njesper Jesper Andersson added a comment -

            Harald Albers How are you running your container?

            I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container.

            With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper.

            Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).

            Show
            njesper Jesper Andersson added a comment - Harald Albers How are you running your container? I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container. With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper. Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).
            Hide
            albers Harald Albers added a comment -

            Jesper Andersson Your questions pointed me to a solution, thanks a lot.

            But first the answers:

            The Docker image of the agent runs as the user jenkins. The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins. The user jenkins obviously does not have sufficient permissions to create a directory in /.

            The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory.

            Another solution would be to pre-create the /caches directory in the image as well.

            I'm fine with this solution.

            But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.

            Show
            albers Harald Albers added a comment - Jesper Andersson Your questions pointed me to a solution, thanks a lot. But first the answers: The Docker image of the agent runs as the user jenkins . The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins . The user jenkins obviously does not have sufficient permissions to create a directory in / . The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory. Another solution would be to pre-create the /caches directory in the image as well. I'm fine with this solution. But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.
            Hide
            carroll Carroll Chiou added a comment -

            I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix (https://github.com/jenkinsci/durable-task-plugin/pull/114) into master.

            Harald Albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.

            Show
            carroll Carroll Chiou added a comment - I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix ( https://github.com/jenkinsci/durable-task-plugin/pull/114 ) into master. Harald Albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.
            Hide
            carroll Carroll Chiou added a comment -

            So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.

            Show
            carroll Carroll Chiou added a comment - So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.
            Hide
            albers Harald Albers added a comment -

            Carroll Chiou 1.33 works for my usecase (build root in /, user not having permissions to create /caches directory)

            Show
            albers Harald Albers added a comment - Carroll Chiou 1.33 works for my usecase (build root in / , user not having permissions to create /caches directory)

              People

              • Assignee:
                carroll Carroll Chiou
                Reporter:
                njesper Jesper Andersson
              • Votes:
                34 Vote for this issue
                Watchers:
                46 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: