Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53750

Shell scripts keep the working directory locked on Windows

    Details

    • Similar Issues:

      Description

      Try to run the following pipeline on a Windows node (you'll need a Unix shell port; I am using bash from Git for Windows):

      node('mynode') {
        stage('Test') {
          dir('foo/bar') { sh 'sleep 1' }
          bat 'rename foo baz'
        }
      }
      

      The bat step will fail with an "Access is denied" message.

      It seems that the reason for this is that when the Durable Task plugin runs its wrapper script (in BourneShellScript.launchWithCookie), it doesn't wait for the wrapper process to finish before it allows pipeline execution to proceed to the next step. It only waits until the result file gets created. This means that for a fraction of time, the bat step will be running concurrently with these three processes created for the sh step:

      1. The wrapper script.
      2. The nohup process that is its parent.
      3. The background job that the wrapper script creates to periodically touch the log file.

      And that fraction can be significant, because the background job sleeps in intervals of three seconds.

      Now, those processes have as their working directory the directory of the sh step, which on Windows means that that directory is locked - it cannot be deleted, and its parent cannot be moved. And since the bat step starts executing before those processes terminate, it cannot rename the foo directory, even though it really should be able to.

      To solve this, ideally, the Durable Task plugin should wait for the nohup process to terminate before proceeding with the next step. Failing that, it should at least make sure that the working directories of the auxiliary processes it spawns aren't within the workspace.

      (I'm unable to easily test this using the most recent version of the plugin, but I reviewed the changes made since 1.22 and I'm fairly confident that the latest version still exhibits this issue.)

        Attachments

          Activity

          Hide
          dnusbaum Devin Nusbaum added a comment - - edited

          While it would be nice to improve the behavior here, it seems like there are some straightforward workarounds if I understand correctly:

          • Move the directory manipulation into your sh/bat script, and combine your scripts into one (it's generally better to combine your scripts anyways)
          • Add a manual sleep between sh/bat steps in cases where both scripts both modify the same directory in the workspace.

          We do not explicitly test using sh on Windows, so although you may be able to get it working with Cygwin, MSys, etc., and we will do our best to fix any regressions we introduce, it is not recommended, and something like this that has never worked (as far as I can tell) does not seem like something we would prioritize for a fix.

          That said, the code is open source, and if you want to work on fixing it, I am happy to help review your changes. I would start with creating a reproduction test in workflow-durable-task-step step along the lines of the following to demonstrate the issue:

          @Issue("JENKINS-53750")
          @Test public void cleanupBeforeStepCompletes() throws Exception {
              Assume.assumeTrue("Test is Windows-specific", Functions.isWindows());
              WorkflowJob p = j.jenkins.createProject(WorkflowJob.class, "p");
              p.setDefinition(new CpsFlowDefinition("node {\n" +
                      "  dir('foo/bar') {\n" +
                      "    sh 'sleep 1'\n" +
                      "  }\n" +
                      "  bat 'rename foo baz'\n" +
                      "}", true));
              j.assertBuildStatusSuccess(p.scheduleBuild2(0));
          }
          

          Once you have that test failing in the way you intend, you could make changes to BourneShellScript/FileMonitoringTask in durable-task to see how they affect the test. I don't have an environment to be able to reproduce the issue myself, but maybe something like switching these two lines so the workspace is cleaned up before the script exits would help?

          Show
          dnusbaum Devin Nusbaum added a comment - - edited While it would be nice to improve the behavior here, it seems like there are some straightforward workarounds if I understand correctly: Move the directory manipulation into your sh/bat script, and combine your scripts into one (it's generally better to combine your scripts anyways) Add a manual sleep between sh/bat steps in cases where both scripts both modify the same directory in the workspace. We do not explicitly test using sh  on Windows, so although you may be able to get it working with Cygwin, MSys, etc., and we will do our best to fix any regressions we introduce, it is not recommended, and something like this that has never worked (as far as I can tell) does not seem like something we would prioritize for a fix. That said, the code is open source, and if you want to work on fixing it, I am happy to help review your changes. I would start with creating a reproduction test in workflow-durable-task-step step along the lines of the following to demonstrate the issue: @Issue( "JENKINS-53750" ) @Test public void cleanupBeforeStepCompletes() throws Exception { Assume.assumeTrue( "Test is Windows-specific" , Functions.isWindows()); WorkflowJob p = j.jenkins.createProject(WorkflowJob.class, "p" ); p.setDefinition( new CpsFlowDefinition( "node {\n" + " dir( 'foo/bar' ) {\n" + " sh 'sleep 1' \n" + " }\n" + " bat 'rename foo baz' \n" + "}" , true )); j.assertBuildStatusSuccess(p.scheduleBuild2(0)); } Once you have that test failing in the way you intend, you could make changes to BourneShellScript/FileMonitoringTask in durable-task to see how they affect the test. I don't have an environment to be able to reproduce the issue myself, but maybe something like switching these two lines so the workspace is cleaned up before the script exits would help?
          Hide
          rdonchen_intel Roman Donchenko added a comment -

          While it would be nice to improve the behavior here, it seems like there are some straightforward workarounds if I understand correctly:

          Indeed. My workaround was simply to run sh using the bat step, instead of using the sh step.

          something like this that has never worked (as far as I can tell)

          I think the issue was introduced in this commit: https://github.com/jenkinsci/durable-task-plugin/commit/8c3504398e5c9292512410077da0e67eaae1c4c4, which is the one that added the timestamp-updating background job.

          I should also note that, while I encountered this issue on Windows, I think in rare cases it might cause problems on Linux, as well. Something like

          sh 'mount -t tmpfs none foo'
          dir('foo') { sh 'sleep 1' }
          sh 'umount foo'
          

          will probably fail in a similar way, although I have not tested this.

          Show
          rdonchen_intel Roman Donchenko added a comment - While it would be nice to improve the behavior here, it seems like there are some straightforward workarounds if I understand correctly: Indeed. My workaround was simply to run sh using the bat step, instead of using the sh step. something like this that has never worked (as far as I can tell) I think the issue was introduced in this commit: https://github.com/jenkinsci/durable-task-plugin/commit/8c3504398e5c9292512410077da0e67eaae1c4c4 , which is the one that added the timestamp-updating background job. I should also note that, while I encountered this issue on Windows, I think in rare cases it might cause problems on Linux, as well. Something like sh 'mount -t tmpfs none foo' dir( 'foo' ) { sh 'sleep 1' } sh 'umount foo' will probably fail in a similar way, although I have not tested this.
          Hide
          dnusbaum Devin Nusbaum added a comment - - edited

          I think the issue was introduced in this commit: https://github.com/jenkinsci/durable-task-plugin/commit/8c3504398e5c9292512410077da0e67eaae1c4c4, which is the one that added the timestamp-updating background job.

          Ah, you're right, it probably was introduced in that commit.

          Either way, I'd recommend moving the directory manipulation into your shell scripts and then combining them so that your script can be run and tested independently from Jenkins and so it is not as susceptible to changes in the implementation of the sh/bat steps.

          Show
          dnusbaum Devin Nusbaum added a comment - - edited I think the issue was introduced in this commit:  https://github.com/jenkinsci/durable-task-plugin/commit/8c3504398e5c9292512410077da0e67eaae1c4c4 , which is the one that added the timestamp-updating background job. Ah, you're right, it probably was introduced in that commit. Either way, I'd recommend moving the directory manipulation into your shell scripts and then combining them so that your script can be run and tested independently from Jenkins and so it is not as susceptible to changes in the implementation of the sh / bat steps.

            People

            • Assignee:
              Unassigned
              Reporter:
              rdonchen_intel Roman Donchenko
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: