Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48485

Aborting a Job running on Windows terminates the process immediately with no chance to run build clean up code, thus leaves build related lock files hanging at slave.

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
      None
    • Similar Issues:

      Description

      I have Jenkins setup with many build and test jobs running on Windows and Linux.

      I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

      On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

      But unable to do such thing on Windows.

      I came to know from https://wiki.jenkins.io/display/JENKINS/Aborting+a+build that on Linux, job is aborted through java.lang.UnixProcess.destroyProcess, which sendsSIGTERM on Sun's JREs, while on Windows, this is done through TerminateProcess API.

      If a process is terminated by TerminateProcess, all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx)

      From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't be handled gracefully by executing build process on Windows host.

      Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

       

        Attachments

          Issue Links

            Activity

            shaupa01 Sharad Upadhyaya created issue -
            shaupa01 Sharad Upadhyaya made changes -
            Field Original Value New Value
            Description I have a Jenkins setup with many build and test jobs running on Windows and Linux.

            I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

            On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

            But unable to do such thing on Windows.

            I came to know from [https://wiki.jenkins.io/display/JENKINS/Aborting+a+build] that on Linux, job is aborted through {{java.lang.UnixProcess.destroyProcess}}, which sends[SIGTERM|http://en.wikipedia.org/wiki/SIGTERM] on Sun's JREs, while on Windows, this is done through [TerminateProcess|http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx] API.

            If a process is terminated by [*TerminateProcess*|https://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx], all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx)

            From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't handled gracefully by executing build process on Windows host.

            Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

             
            I have Jenkins setup with many build and test jobs running on Windows and Linux.

            I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

            On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

            But unable to do such thing on Windows.

            I came to know from [https://wiki.jenkins.io/display/JENKINS/Aborting+a+build] that on Linux, job is aborted through {{java.lang.UnixProcess.destroyProcess}}, which sends[SIGTERM|http://en.wikipedia.org/wiki/SIGTERM] on Sun's JREs, while on Windows, this is done through [TerminateProcess|http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx] API.

            If a process is terminated by [*TerminateProcess*|https://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx], all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: [https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx])

            From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't handled gracefully by executing build process on Windows host.

            Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

             
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Please provide...

            1) Your version of Jenkins
            2) Version of Remoting you use on the agent
            3) Version of Java on the agent (which version? 32 or 64 bit?)
            4) Output of http://file-leak-detector.kohsuke.org/

            Show
            oleg_nenashev Oleg Nenashev added a comment - Please provide... 1) Your version of Jenkins 2) Version of Remoting you use on the agent 3) Version of Java on the agent (which version? 32 or 64 bit?) 4) Output of http://file-leak-detector.kohsuke.org/
            Hide
            shaupa01 Sharad Upadhyaya added a comment -

            Here's the information required:

            1) Your version of Jenkins -   2.60.3

            2) Version of Remoting you use on the agent - 3.7

            3) Version of Java on the agent (which version? 32 or 64 bit?) - 1.8.0_144, some agents running 64 bit java and some running 32 bit.

            4) Output of http://file-leak-detector.kohsuke.org/

            Show
            shaupa01 Sharad Upadhyaya added a comment - Here's the information required: 1) Your version of Jenkins -    2.60.3 2) Version of Remoting you use on the agent - 3.7 3) Version of Java on the agent (which version? 32 or 64 bit?) - 1.8.0_144 , some agents running 64 bit java and some running 32 bit. 4) Output of http://file-leak-detector.kohsuke.org/
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            If you are running a 32-bit Java on a 64-bit system, Jenkins won't be able to abort processes correctly: https://github.com/kohsuke/winp#platform-support . Not sure whether it leads to file leaks, I need the File Leak detector output to say for sure

            Show
            oleg_nenashev Oleg Nenashev added a comment - If you are running a 32-bit Java on a 64-bit system, Jenkins won't be able to abort processes correctly: https://github.com/kohsuke/winp#platform-support . Not sure whether it leads to file leaks, I need the File Leak detector output to say for sure
            shaupa01 Sharad Upadhyaya made changes -
            Summary Aborting a Job running on Windows host leaves hanging lock files. Aborting a Job running on Windows terminates the process immediately with no chance to run build clean up code, thus leaves build related lock files hanging at slave.
            Hide
            shaupa01 Sharad Upadhyaya added a comment -

            I have seen this issue on both Slaves running 64 bit and 32 bit. 

            We have code in our build scripts to handle the termination signal and process the build cleanup as our build uses lock files at multiple phases, also mounts temporary drives to shorten build path on Windows.

            On Windows agents, seems like the process is getting terminated immediately using TerminateProcess API with no chance for process to execute any build clean up code, thus leaving the lock files and mounted drives hanging.

            Working fine on Linux agents as build process detects SIGTERM signal received on abort and executes build clean up code.

            Here I think the issue is how job is getting terminated by Jenkins on Windows agents which is different from the way it's getting executed on Linux.

            Show
            shaupa01 Sharad Upadhyaya added a comment - I have seen this issue on both Slaves running 64 bit and 32 bit.  We have code in our build scripts to handle the termination signal and process the build cleanup as our build uses lock files at multiple phases, also mounts temporary drives to shorten build path on Windows. On Windows agents, seems like the process is getting terminated immediately using TerminateProcess  API with no chance for process to execute any build clean up code, thus leaving the lock files and mounted drives hanging. Working fine on Linux agents as build process detects SIGTERM signal received on abort and executes build clean up code. Here I think the issue is how job is getting terminated by Jenkins on Windows agents which is different from the way it's getting executed on Linux.
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Yes, Windows process termination flow is different. There are known issues like JENKINS-19156 which do not allow doing graceful process tree termination like in Unix.

            Switching to ExitProcess in https://github.com/kohsuke/winp/blob/4bdec1e8d28d4f5fcf2cf309074284eef1813736/native/winp.cpp#L35 could be reasonable, but I feel that this approach is not sufficient. IMHO a more complex logic is required to ensure that the operation does not hang, etc.

            Show
            oleg_nenashev Oleg Nenashev added a comment - Yes, Windows process termination flow is different. There are known issues like JENKINS-19156 which do not allow doing graceful process tree termination like in Unix. Switching to ExitProcess in https://github.com/kohsuke/winp/blob/4bdec1e8d28d4f5fcf2cf309074284eef1813736/native/winp.cpp#L35 could be reasonable, but I feel that this approach is not sufficient. IMHO a more complex logic is required to ensure that the operation does not hang, etc.
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue is related to JENKINS-19156 [ JENKINS-19156 ]
            shaupa01 Sharad Upadhyaya made changes -
            Description I have Jenkins setup with many build and test jobs running on Windows and Linux.

            I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

            On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

            But unable to do such thing on Windows.

            I came to know from [https://wiki.jenkins.io/display/JENKINS/Aborting+a+build] that on Linux, job is aborted through {{java.lang.UnixProcess.destroyProcess}}, which sends[SIGTERM|http://en.wikipedia.org/wiki/SIGTERM] on Sun's JREs, while on Windows, this is done through [TerminateProcess|http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx] API.

            If a process is terminated by [*TerminateProcess*|https://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx], all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: [https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx])

            From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't handled gracefully by executing build process on Windows host.

            Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

             
            I have Jenkins setup with many build and test jobs running on Windows and Linux.

            I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

            On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

            But unable to do such thing on Windows.

            I came to know from [https://wiki.jenkins.io/display/JENKINS/Aborting+a+build] that on Linux, job is aborted through {{java.lang.UnixProcess.destroyProcess}}, which sends[SIGTERM|http://en.wikipedia.org/wiki/SIGTERM] on Sun's JREs, while on Windows, this is done through [TerminateProcess|http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx] API.

            If a process is terminated by [*TerminateProcess*|https://msdn.microsoft.com/en-us/library/windows/desktop/ms686714(v=vs.85).aspx], all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: [https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx])

            From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't be handled gracefully by executing build process on Windows host.

            Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

             

              People

              • Assignee:
                Unassigned
                Reporter:
                shaupa01 Sharad Upadhyaya
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: