Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9104

Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None
    • Environment:
      Windows XP, Windows 7 using MSBuild or devenv.exe to build MS Visual Studio Projects
    • Similar Issues:

      Description

      I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
      This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].

      I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
      I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.

      Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.


      [1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
      [2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/

        Attachments

          Issue Links

            Activity

            Hide
            markewaite Mark Waite added a comment - - edited

            You might refer to JENKINS-3105 for a work around or alternative technique. I assume the process tree killer is what is killing the mspdbsrv.exe process, and disabling process tree killer for these jobs may avoid the problem.

            Show
            markewaite Mark Waite added a comment - - edited You might refer to JENKINS-3105 for a work around or alternative technique. I assume the process tree killer is what is killing the mspdbsrv.exe process, and disabling process tree killer for these jobs may avoid the problem.
            Hide
            gordin Christoph Vogtländer added a comment - - edited

            Yes, you are right. Thanks a lot for pointing this out. I set BUILD_ID=dontKillMe globally in the Jenkins configuration and can confirm that mspdbsrv.exe is not killed any more which works around the problem.

            Show
            gordin Christoph Vogtländer added a comment - - edited Yes, you are right. Thanks a lot for pointing this out. I set BUILD_ID=dontKillMe globally in the Jenkins configuration and can confirm that mspdbsrv.exe is not killed any more which works around the problem.
            Hide
            gordin Christoph Vogtländer added a comment -

            changed priority to Minor because an easy work around is available

            Show
            gordin Christoph Vogtländer added a comment - changed priority to Minor because an easy work around is available
            Hide
            chantivlad chanti vlad added a comment -

            "Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time."

            Although i will try the indicated workaround, i think this would be good to have it as a feature somewhere, maybe in the node configuration?

            Show
            chantivlad chanti vlad added a comment - "Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time." Although i will try the indicated workaround, i think this would be good to have it as a feature somewhere, maybe in the node configuration?
            Hide
            danielweber Daniel Weber added a comment -

            How about adding a configurable list of process names which shall not be killed by the "process tree killer"?
            The workaround described above leaves mspdbsrv.exe running, but also any other "dangling" processes a build
            might have left behind. I'd still like to use the process tree killer in general, it should just not kill
            mspdbsrv.exe.

            Show
            danielweber Daniel Weber added a comment - How about adding a configurable list of process names which shall not be killed by the "process tree killer"? The workaround described above leaves mspdbsrv.exe running, but also any other "dangling" processes a build might have left behind. I'd still like to use the process tree killer in general, it should just not kill mspdbsrv.exe.
            Hide
            fnx Neil Bird added a comment -

            Visual Studio is so ubiquitous that I personally think this could warrant a special case built into Jenkins, but anyway this is what we have just started doing (before calling devenv but after the MSC setup.bat has been called) to work around this (for all Windows builds):

            :: PITA to keep MSPDBSRV alive
            set ORIG_BUILD_ID=%BUILD_ID%
            set BUILD_ID=DoNotKillMe
            start mspdbsrv -start -spawn
            set BUILD_ID=%ORIG_BUILD_ID%
            set ORIG_BUILD_ID=

            It seems, from reading around, that the Jenkins process “tree” killer rummages through the whole process tree looking for processes with the environment variable BUILD_ID set to what Jenkins originally set it to, and killing any it finds. The above temporarily changes that to something else (anything, you could even just blank it), launches mspdbsrv manually (so it has the altered env. var.) and then puts it back (to restore the usual no-resource-leak fix).

            When you run mspdbsrv, it seems to immediately exit if there's already an appropriate version running, so it doesn't matter having multiple jobs trying to kick it off.

            Show
            fnx Neil Bird added a comment - Visual Studio is so ubiquitous that I personally think this could warrant a special case built into Jenkins, but anyway this is what we have just started doing (before calling devenv but after the MSC setup.bat has been called) to work around this (for all Windows builds): :: PITA to keep MSPDBSRV alive set ORIG_BUILD_ID=%BUILD_ID% set BUILD_ID=DoNotKillMe start mspdbsrv -start -spawn set BUILD_ID=%ORIG_BUILD_ID% set ORIG_BUILD_ID= It seems, from reading around, that the Jenkins process “tree” killer rummages through the whole process tree looking for processes with the environment variable BUILD_ID set to what Jenkins originally set it to, and killing any it finds. The above temporarily changes that to something else (anything, you could even just blank it), launches mspdbsrv manually (so it has the altered env. var.) and then puts it back (to restore the usual no-resource-leak fix). When you run mspdbsrv, it seems to immediately exit if there's already an appropriate version running, so it doesn't matter having multiple jobs trying to kick it off.
            Hide
            danielweber Daniel Weber added a comment -

            We adopted Neil's workaround, but found a small issue. The problem is that checking if
            mspdbsrv is running does not reset its timeout. Consider this scenario where the compiler
            is first used 5 min after build start:

            00:00 job A starts -> mspdbsrv not yet running -> mspdbsrv is started with the changed BUILD_ID
            01:00 job A finishes (mspdbsrv default timeout of 10 min starts here)
            01:07 job B starts -> mspdbsrv still running -> mspdbsrv is not restarted
            01:10 mspdbsrv timeout elapses, shuts down
            01:12 job B tries to use mspdbsrv for the first time. As mspdbsrv is not running, it starts a new
            instance with unchanged BUILD_ID, which is what we wanted to avoid in the first place.

            I suggest to change the build script as follows

            start mspdbsrv -start -spawn -shutdowntime 2147483647

            This sets the shutdown time to the max value of 2^31-1. This timeout (~68 years) should be long enough

            Show
            danielweber Daniel Weber added a comment - We adopted Neil's workaround, but found a small issue. The problem is that checking if mspdbsrv is running does not reset its timeout. Consider this scenario where the compiler is first used 5 min after build start: 00:00 job A starts -> mspdbsrv not yet running -> mspdbsrv is started with the changed BUILD_ID 01:00 job A finishes (mspdbsrv default timeout of 10 min starts here) 01:07 job B starts -> mspdbsrv still running -> mspdbsrv is not restarted 01:10 mspdbsrv timeout elapses, shuts down 01:12 job B tries to use mspdbsrv for the first time. As mspdbsrv is not running, it starts a new instance with unchanged BUILD_ID, which is what we wanted to avoid in the first place. I suggest to change the build script as follows start mspdbsrv -start -spawn -shutdowntime 2147483647 This sets the shutdown time to the max value of 2^31-1. This timeout (~68 years) should be long enough
            Hide
            gordin Christoph Vogtländer added a comment -

            I would not recommend setting the shutdown time to such a high value. mspdbsrv.exe tends to hang after some time because of handle leaks (at least the version shipped with Visual Studio 2005) causing the compiler to exit with error when trying to access pdb files. mspdbsrv process must be killed manually, then.

            Show
            gordin Christoph Vogtländer added a comment - I would not recommend setting the shutdown time to such a high value. mspdbsrv.exe tends to hang after some time because of handle leaks (at least the version shipped with Visual Studio 2005) causing the compiler to exit with error when trying to access pdb files. mspdbsrv process must be killed manually, then.
            Hide
            yimin Yimin Li added a comment -

            Is it possible to add the workaround to the next Jenkins release and solve the problem later?

            Show
            yimin Yimin Li added a comment - Is it possible to add the workaround to the next Jenkins release and solve the problem later?
            Hide
            peteboyrocket Pete W added a comment -

            Seconded Yimin Li's comment...

            Show
            peteboyrocket Pete W added a comment - Seconded Yimin Li's comment...
            Hide
            laro Lars Rosenboom added a comment -

            Still happening with Visual Studio 2010 (under Windows 7), BTW.

            Show
            laro Lars Rosenboom added a comment - Still happening with Visual Studio 2010 (under Windows 7), BTW.
            Hide
            sweavo Steve Carter added a comment -

            Just want to add my own "me too" to this. (Visual Studio 2010, Win7x64, MSBUILD plugin) My thanks to the thread contributors so far for saving me a lot of detective work.

            Show
            sweavo Steve Carter added a comment - Just want to add my own "me too" to this. (Visual Studio 2010, Win7x64, MSBUILD plugin) My thanks to the thread contributors so far for saving me a lot of detective work.
            Hide
            lukast_dev Lukas Tvrdy added a comment -

            Just want to add my own "me too" to this. I see failed build with fatal error C1090: PDB API call failed, error code '23'
            Visual Studio 2008, Windows Server 2003, MSBuild plugin, Jenkins 1.527
            I tried to use workaround with setting BUILD_ID to dontKillMe, but that did not help

            Show
            lukast_dev Lukas Tvrdy added a comment - Just want to add my own "me too" to this. I see failed build with fatal error C1090: PDB API call failed, error code '23' Visual Studio 2008, Windows Server 2003, MSBuild plugin, Jenkins 1.527 I tried to use workaround with setting BUILD_ID to dontKillMe, but that did not help
            Hide
            sweavo Steve Carter added a comment -

            A (mostly unsatisfactory) workaround for me has been to use the Throttle Concurrent Builds plugin and make all MSBUILD projects be members of the same category, with a concurrency limit of 1. This means that my builds are no longer failing for arbitrary reasons, but it means that jobs like build_release_config_and_run_10_hour_integration_tests block the build_head_on_branch_x_and barf_if_unit_tests_fail jobs.

            Show
            sweavo Steve Carter added a comment - A (mostly unsatisfactory) workaround for me has been to use the Throttle Concurrent Builds plugin and make all MSBUILD projects be members of the same category, with a concurrency limit of 1. This means that my builds are no longer failing for arbitrary reasons, but it means that jobs like build_release_config_and_run_10_hour_integration_tests block the build_head_on_branch_x_and barf_if_unit_tests_fail jobs.
            Hide
            sweavo Steve Carter added a comment - - edited

            Hi all,

            I've written a python script that basically implements reference counting and resetting of timeouts as a wrapper around MSPDBSRV.EXE.

            I'm using this locally as of today and would love it if others could try it, improve it, give feedback, etc.

            https://github.com/sweavo/pdbsrv_srv

            I have an Execute Shell build step before my MSBUILD build step that contains:

            set -e
            # We must be in the script's dir because it may try to execute itself again
            cd Tools/bin
            python pdbsrv_srv.py -l ../../pdbsrv.log & 
            echo pdbsrv_srv is pid $!
            sleep 5
            

            Note that the path supplied to -l should not be in the workspace because a lock will be held after the end of the job, and a subsequent build might then fail to delete the workspace if required.

            Show
            sweavo Steve Carter added a comment - - edited Hi all, I've written a python script that basically implements reference counting and resetting of timeouts as a wrapper around MSPDBSRV.EXE. I'm using this locally as of today and would love it if others could try it, improve it, give feedback, etc. https://github.com/sweavo/pdbsrv_srv I have an Execute Shell build step before my MSBUILD build step that contains: set -e # We must be in the script's dir because it may try to execute itself again cd Tools/bin python pdbsrv_srv.py -l ../../pdbsrv.log & echo pdbsrv_srv is pid $! sleep 5 Note that the path supplied to -l should not be in the workspace because a lock will be held after the end of the job, and a subsequent build might then fail to delete the workspace if required.
            Hide
            leedega Kevin Phillips added a comment -

            We have been using Jenkins LTS edition for over a year now without error. We recently updated to v1.532.3, also without error. However last night we just upgraded to v1.554.1 to get a couple of minor bug fixes and now we are experiencing this issue constantly. We have a fairly large CI farm with a dozen agents and ~700 jobs, and we are getting this new compilation error across the board. This suggests that whatever is causing this issue was somehow introduced in the v1.554.1 update.

            That being said, I've noticed the comment thread above predates this release, so perhaps the problem was only affecting the latest non-LTS edition until now. Maybe this can help isolate the problem further.

            Show
            leedega Kevin Phillips added a comment - We have been using Jenkins LTS edition for over a year now without error. We recently updated to v1.532.3, also without error. However last night we just upgraded to v1.554.1 to get a couple of minor bug fixes and now we are experiencing this issue constantly. We have a fairly large CI farm with a dozen agents and ~700 jobs, and we are getting this new compilation error across the board. This suggests that whatever is causing this issue was somehow introduced in the v1.554.1 update. That being said, I've noticed the comment thread above predates this release, so perhaps the problem was only affecting the latest non-LTS edition until now. Maybe this can help isolate the problem further.
            Hide
            leedega Kevin Phillips added a comment -

            Also, I can confirm that the source of this bug is most likely caused by the fact that something is killing the mspdbsrv.exe service while it is in use. We were able to reproduce this problem a long time ago when we first adopted Visual Studio 2008, well before we started using Jenkins. The way we reproduced the problem was independent from any tool, as follows:

            1. Set up a background process / system service to run as a local user profile which will perform the build. Lets call this user 'MyUser'. This may be a scheduled task, a Jenkins agent, or an number of other service-oriented processes available on Windows.
            2. Log in to the system locally using the MyUser profile.
            3. Launch a compile using the background process
            4. Open Process Explorer or Task Manager and look at the processes launched by your active user. You'll see mspdbsrv.exe in that list
            5. Log the MyUser user out of the system
            6. The background compilation will fail

            Cause: when the background process is launched Visual Studio will spawn an independent process for mspdbsrv.exe, which is apparently used to synchronize file accesses across parallel builds. When the user profile associated with this background service is also logged in to the local machine this new process will be launched in the active users thread space. So then, when this local user logs out from the local machine this thread is terminated, causing any other processes (such as those which continue doing compilation in the background) to fail because they depend on this service.

            At the end of the day this is just a horrible design flaw in Visual Studio which has been in place since the introduction of their multi-threaded builds many years ago, and from what I've read on the forums it is considered a "feature by design" and is not expected to be fixed - ever. Consequently, the workaround we decided upon at the time was just to adopt the convention of never logging out of the user profile associated with our automated builds. In this way we avoid accidental termination of this critical process.

            So now enter Jenkins. If what I have read is true and Jenkins tries to be smart by scanning the agent systems for "rogue" threads after the completion of each build and terminates them at will, then I have even more cause for concern. Those concerns aside, assuming this is being done for a good reason, I concur with an earlier commend that recommended that this termination logic have an explicit exception to prevent killing this particular process. Given that the information I have gathered and stated above is correct, this seems to be the only reasonable solution to this problem. Finally, if I am correct and this 'bug' was just introduced by the latest update to the LTS edition then conceivable it should be easy to isolate and promptly fix.

            Show
            leedega Kevin Phillips added a comment - Also, I can confirm that the source of this bug is most likely caused by the fact that something is killing the mspdbsrv.exe service while it is in use. We were able to reproduce this problem a long time ago when we first adopted Visual Studio 2008, well before we started using Jenkins. The way we reproduced the problem was independent from any tool, as follows: 1. Set up a background process / system service to run as a local user profile which will perform the build. Lets call this user 'MyUser'. This may be a scheduled task, a Jenkins agent, or an number of other service-oriented processes available on Windows. 2. Log in to the system locally using the MyUser profile. 3. Launch a compile using the background process 4. Open Process Explorer or Task Manager and look at the processes launched by your active user. You'll see mspdbsrv.exe in that list 5. Log the MyUser user out of the system 6. The background compilation will fail Cause: when the background process is launched Visual Studio will spawn an independent process for mspdbsrv.exe, which is apparently used to synchronize file accesses across parallel builds. When the user profile associated with this background service is also logged in to the local machine this new process will be launched in the active users thread space. So then, when this local user logs out from the local machine this thread is terminated, causing any other processes (such as those which continue doing compilation in the background) to fail because they depend on this service. At the end of the day this is just a horrible design flaw in Visual Studio which has been in place since the introduction of their multi-threaded builds many years ago, and from what I've read on the forums it is considered a "feature by design" and is not expected to be fixed - ever. Consequently, the workaround we decided upon at the time was just to adopt the convention of never logging out of the user profile associated with our automated builds. In this way we avoid accidental termination of this critical process. So now enter Jenkins. If what I have read is true and Jenkins tries to be smart by scanning the agent systems for "rogue" threads after the completion of each build and terminates them at will, then I have even more cause for concern. Those concerns aside, assuming this is being done for a good reason, I concur with an earlier commend that recommended that this termination logic have an explicit exception to prevent killing this particular process. Given that the information I have gathered and stated above is correct, this seems to be the only reasonable solution to this problem. Finally, if I am correct and this 'bug' was just introduced by the latest update to the LTS edition then conceivable it should be easy to isolate and promptly fix.
            Hide
            leedega Kevin Phillips added a comment -

            I also just noticed that the severity of this issue has been set to "minor" however I would recommend increasing it to "production stopped". In our case, with ~700 jobs spread across a dozen powerful build servers, this bug is causing dozens of superfluous build failures per hour making this latest update to the LTS edition completely unusable.

            Show
            leedega Kevin Phillips added a comment - I also just noticed that the severity of this issue has been set to "minor" however I would recommend increasing it to "production stopped". In our case, with ~700 jobs spread across a dozen powerful build servers, this bug is causing dozens of superfluous build failures per hour making this latest update to the LTS edition completely unusable.
            Hide
            danielbeck Daniel Beck added a comment -

            Aren't you able to launch that service manually instead of having it launched by the first build to come along?

            Show
            danielbeck Daniel Beck added a comment - Aren't you able to launch that service manually instead of having it launched by the first build to come along?
            Hide
            danielbeck Daniel Beck added a comment -

            Recursive process killing on Windows was added between versions 1.16 (1.532.x) and 1.19 (1.554.x) of that library, see here.

            Workaround could be to block loading of winp in some way.

            Show
            danielbeck Daniel Beck added a comment - Recursive process killing on Windows was added between versions 1.16 (1.532.x) and 1.19 (1.554.x) of that library, see here . Workaround could be to block loading of winp in some way.
            Hide
            gordin Christoph Vogtländer added a comment -

            I set the severity to minor because an easy workaround is available. What is the reason you can't use "BUILD_ID=dontKillMe" environment variable? This disables the process killer for the job (or globally if set for the whole Jenkins instance). Generally I think the process killer is a good thing, but normally it shouldn't be needed.

            Show
            gordin Christoph Vogtländer added a comment - I set the severity to minor because an easy workaround is available. What is the reason you can't use "BUILD_ID=dontKillMe" environment variable? This disables the process killer for the job (or globally if set for the whole Jenkins instance). Generally I think the process killer is a good thing, but normally it shouldn't be needed.
            Hide
            danielbeck Daniel Beck added a comment -

            Christoph: Have you tried this on Windows with winp doing the killing? I think it works differently there.

            Show
            danielbeck Daniel Beck added a comment - Christoph: Have you tried this on Windows with winp doing the killing? I think it works differently there.
            Hide
            gordin Christoph Vogtländer added a comment -

            sorry, no I haven't tried this with winp (to be honest, I don't even know what winp is). I will try with an current Jenkins setup.

            Show
            gordin Christoph Vogtländer added a comment - sorry, no I haven't tried this with winp (to be honest, I don't even know what winp is). I will try with an current Jenkins setup.
            Hide
            danielbeck Daniel Beck added a comment -

            Winp in the library doing the recursive killing on Windows (if available), and that is what was fixed between 1.532.x and 1.554.x – so Kevin is correct that this was changed between these versions.

            There's a few reported issues related to winp not working reliably, maybe one of them can be exploited as a workaround to prevent it from killing pspdbsrv.

            Show
            danielbeck Daniel Beck added a comment - Winp in the library doing the recursive killing on Windows (if available), and that is what was fixed between 1.532.x and 1.554.x – so Kevin is correct that this was changed between these versions. There's a few reported issues related to winp not working reliably, maybe one of them can be exploited as a workaround to prevent it from killing pspdbsrv.
            Hide
            sweavo Steve Carter added a comment -

            "Aren't you able to launch that service manually instead of having it launched by the first build to come along?"

            This is covered in the comments. The service times out, which can still happen mid-build. If you set the timeout long, then you risk memory leaks.

            Show
            sweavo Steve Carter added a comment - "Aren't you able to launch that service manually instead of having it launched by the first build to come along?" This is covered in the comments. The service times out, which can still happen mid-build. If you set the timeout long, then you risk memory leaks.
            Hide
            gordin Christoph Vogtländer added a comment -

            Setting the BUILD_ID to "dontKillMe" still works as expected with Jenkins 1.554.1 LTS. Even though I'm not able to test the original set up (as I don't use VS and mspdbsrv.exe any longer) a new process spawned during run with python subprocess.Popen() will not be killed by the process tree killer. Running without setting the BUILD_ID will kill the subprocess as expected.

            Show
            gordin Christoph Vogtländer added a comment - Setting the BUILD_ID to "dontKillMe" still works as expected with Jenkins 1.554.1 LTS. Even though I'm not able to test the original set up (as I don't use VS and mspdbsrv.exe any longer) a new process spawned during run with python subprocess.Popen() will not be killed by the process tree killer. Running without setting the BUILD_ID will kill the subprocess as expected.
            Hide
            danielbeck Daniel Beck added a comment -

            Great! In that case, this is not a defect but behaves as intended.

            What would be a good location to document setting BUILD_ID to prevent process killing? Obviously, there's a need there...

            Show
            danielbeck Daniel Beck added a comment - Great! In that case, this is not a defect but behaves as intended. What would be a good location to document setting BUILD_ID to prevent process killing? Obviously, there's a need there...
            Hide
            gordin Christoph Vogtländer added a comment -
            Show
            gordin Christoph Vogtländer added a comment - It is already documented at https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller
            Hide
            danielbeck Daniel Beck added a comment -

            Random wiki pages aren't exactly discoverable. Unless you know it's there you wouldn't even bother searching.

            Maybe add to the description of shell/batch build steps that launched processes are cleaned up after the script exits, and that this can be disabled?

            Show
            danielbeck Daniel Beck added a comment - Random wiki pages aren't exactly discoverable. Unless you know it's there you wouldn't even bother searching. Maybe add to the description of shell/batch build steps that launched processes are cleaned up after the script exits, and that this can be disabled?
            Hide
            gordin Christoph Vogtländer added a comment -

            I don't know if the build step is the right scope for the documentation. The descriptions of the different build steps are provided by the plug-ins, aren't they? Would be hard to have a consistent message across all plug-ins. At least python, msbuild, shell, windows batch come to my mind. Maybe also groovy, qmake, cmake and others that provide an api to spawn a process.
            In order to solve this issue it would be nice to have some sort of process name white-list with processes that will never be killed by the tree killer. This could then be configured globally (master/per slave). What do you think?

            Show
            gordin Christoph Vogtländer added a comment - I don't know if the build step is the right scope for the documentation. The descriptions of the different build steps are provided by the plug-ins, aren't they? Would be hard to have a consistent message across all plug-ins. At least python, msbuild, shell, windows batch come to my mind. Maybe also groovy, qmake, cmake and others that provide an api to spawn a process. In order to solve this issue it would be nice to have some sort of process name white-list with processes that will never be killed by the tree killer. This could then be configured globally (master/per slave). What do you think?
            Hide
            danielbeck Daniel Beck added a comment -

            Christoph: Shell and Batch are both in core and the most straightforward choices for launching programs. Specialist plugins might not even allow this much flexibility.

            I'd just like to get more discoverability, and that doesn't need annotating at every conceivable location. If users knows this solution from Batch/Shell descriptions, and maybe transfer it over to similar plugin-provided builders, that's perfect.


            Tree killer configuration might be helpful, but that should be filed in a new issue. AFAICT this needs to touch a lot of parts, so this would be a rather large project.

            Show
            danielbeck Daniel Beck added a comment - Christoph: Shell and Batch are both in core and the most straightforward choices for launching programs. Specialist plugins might not even allow this much flexibility. I'd just like to get more discoverability, and that doesn't need annotating at every conceivable location. If users knows this solution from Batch/Shell descriptions, and maybe transfer it over to similar plugin-provided builders, that's perfect. Tree killer configuration might be helpful, but that should be filed in a new issue. AFAICT this needs to touch a lot of parts, so this would be a rather large project.
            Hide
            leedega Kevin Phillips added a comment - - edited

            I have a few follow up questions:
            1. If I understand correctly, this "process tree killer" feature was pre-existing in earlier Jenkins releases, but only in the latest update was it "changed" to add recursive killing of processes, correct?

            2. That being the case, does setting "BUILD_ID=dontKillMe" disable termination of all processes or just this new "recursive" behavior? If it disables all process terminations I'd say this proposal would not be a viable workaround since it could risk leaving other rogue processes orphaned on a build machine, which has many adverse side effects (which, I'm guessing you already know since I suspect this feature was implemented to resolve these exact problems)

            3. Won't setting the "BUILD_ID=dontKillMe" affect other parts of the build? The BUILD_ID env var is used as a unique identifier throughout the job after all. Changing it from the unique identifier it is meant to be, to a statically defined character string seems fragile at best.

            Show
            leedega Kevin Phillips added a comment - - edited I have a few follow up questions: 1. If I understand correctly, this "process tree killer" feature was pre-existing in earlier Jenkins releases, but only in the latest update was it "changed" to add recursive killing of processes, correct? 2. That being the case, does setting "BUILD_ID=dontKillMe" disable termination of all processes or just this new "recursive" behavior? If it disables all process terminations I'd say this proposal would not be a viable workaround since it could risk leaving other rogue processes orphaned on a build machine, which has many adverse side effects (which, I'm guessing you already know since I suspect this feature was implemented to resolve these exact problems) 3. Won't setting the "BUILD_ID=dontKillMe" affect other parts of the build? The BUILD_ID env var is used as a unique identifier throughout the job after all. Changing it from the unique identifier it is meant to be, to a statically defined character string seems fragile at best.
            Hide
            leedega Kevin Phillips added a comment - - edited

            So far, based on the recent comment threads, my admittedly superficial understanding of the root cause, and some quick Googling, it seems there are only a few viable options to resolve this issue:

            1. A python script was written by an earlier commenter, which leverages the BUILD_ID env var to strategically control the lifetime of the pdbsrv process itself without affecting other parts of the build.

            • This seems like a pretty harsh workaround to what is obviously a problem introduced by changes made in the latest Jenkins LTS update.

            2. Roll back the version of this "process tree killer" used by Jenkins LTS to v1.16, before this new "recursive" behavior was added according to an earlier comment.

            • I assume LTS releases are expected to maintain a certain level of stability and consistency in their behaviors. That being the case, this change obviously caused critical, debilitating side effects to Visual Studio users and thus should not have been included in an update release.

            3. Provide some kind of workaround within the "process tree killer" or the Jenkins core libraries to compensate for this newly discovered problem.

            • From what I gather from the earlier comments, this may be a non trivial task. However, if this new recursive logic in the process tree killer is absolutely required in Jenkins LTS for some reason, I think this work must be done. Anything else (scripting, documentation notes, etc.) would just be trying to hide the fact that this is an underlying architectural problem - imo.

            4. Accept the fact that Visual Studio users will likely never use Jenkins version that include this "new feature", forcing them to use versions of Jenkins that predate this change.

            • Currently this is the solution that my team and I have chosen to adopt until a more reasonable solution can be found.
            • Just to clarify our rationale for this decision: Using v1.532.x works just fine with Visual Studio. Upgrading to v1.554.x does not work - at all. Period. To do otherwise would require extra time (and, hence, money) on our part to workaround the problem, for little to no benefit on our part.
            Show
            leedega Kevin Phillips added a comment - - edited So far, based on the recent comment threads, my admittedly superficial understanding of the root cause, and some quick Googling, it seems there are only a few viable options to resolve this issue: 1. A python script was written by an earlier commenter, which leverages the BUILD_ID env var to strategically control the lifetime of the pdbsrv process itself without affecting other parts of the build. This seems like a pretty harsh workaround to what is obviously a problem introduced by changes made in the latest Jenkins LTS update. 2. Roll back the version of this "process tree killer" used by Jenkins LTS to v1.16, before this new "recursive" behavior was added according to an earlier comment . I assume LTS releases are expected to maintain a certain level of stability and consistency in their behaviors. That being the case, this change obviously caused critical, debilitating side effects to Visual Studio users and thus should not have been included in an update release. 3. Provide some kind of workaround within the "process tree killer" or the Jenkins core libraries to compensate for this newly discovered problem. From what I gather from the earlier comments, this may be a non trivial task. However, if this new recursive logic in the process tree killer is absolutely required in Jenkins LTS for some reason, I think this work must be done. Anything else (scripting, documentation notes, etc.) would just be trying to hide the fact that this is an underlying architectural problem - imo. 4. Accept the fact that Visual Studio users will likely never use Jenkins version that include this "new feature", forcing them to use versions of Jenkins that predate this change. Currently this is the solution that my team and I have chosen to adopt until a more reasonable solution can be found. Just to clarify our rationale for this decision: Using v1.532.x works just fine with Visual Studio. Upgrading to v1.554.x does not work - at all. Period. To do otherwise would require extra time (and, hence, money) on our part to workaround the problem, for little to no benefit on our part.
            Hide
            leedega Kevin Phillips added a comment -

            Aside
            I probably should say that I truly believe the real root cause of this problem is an underlying architectural issue with Visual Studio and it's use of this pdbsrv process in their newer compilers, but numerous forums and bug reports to Microsoft appear to fall on deaf ears (ie: they claim it's working this way by design). Given the fact that this has been a problem in Visual Studio for several releases spread across many years it's unlikely to change any time soon, so you may be forced to compensate for it here in your tool. To do otherwise will simply make it more difficult (and, by extension, less likely) for Visual Studio users to adopt / continue using your tool.

            Show
            leedega Kevin Phillips added a comment - Aside I probably should say that I truly believe the real root cause of this problem is an underlying architectural issue with Visual Studio and it's use of this pdbsrv process in their newer compilers, but numerous forums and bug reports to Microsoft appear to fall on deaf ears (ie: they claim it's working this way by design). Given the fact that this has been a problem in Visual Studio for several releases spread across many years it's unlikely to change any time soon, so you may be forced to compensate for it here in your tool. To do otherwise will simply make it more difficult (and, by extension, less likely) for Visual Studio users to adopt / continue using your tool.
            Hide
            danielbeck Daniel Beck added a comment -

            Does this also happen with MSBuild, or only Devenv? Can you switch to the former? What about systems without Visual Studio installed, instead using only MSBuild/Windows SDK?

            (I'm not too familiar with Visual Studio projects beyond pressing an F-key to build them, so this might well be a stupid question)

            Show
            danielbeck Daniel Beck added a comment - Does this also happen with MSBuild, or only Devenv? Can you switch to the former? What about systems without Visual Studio installed, instead using only MSBuild/Windows SDK? (I'm not too familiar with Visual Studio projects beyond pressing an F-key to build them, so this might well be a stupid question)
            Hide
            leedega Kevin Phillips added a comment -

            From what I understand this is a problem with the compiler, which I think is the same compiler used under the hood by both msbuild and devenv, however I have not confirmed first hand the same problems arise in both situations. I'd be surprised if they didn't.

            As for building our projects without Visual Studio, with just MSBuild / Windows SDK, we have as of yet been unable to do so. We have heavy dependencies on MFC which hasn't, until recently, been available outside of Visual Studio. Plus we have had numerous technical issues migrating to the newer versions of the SDK / MSBuild that do include them. Regardless, again I'd be surprised if any of this made any difference unless the compiler that ships with the SDK is fundamentally architecturally different than the one that ships with VS.

            If I can spare some time to confirm a few of these details I'll let you know, even if just for curiosities sake.

            Show
            leedega Kevin Phillips added a comment - From what I understand this is a problem with the compiler, which I think is the same compiler used under the hood by both msbuild and devenv, however I have not confirmed first hand the same problems arise in both situations. I'd be surprised if they didn't. As for building our projects without Visual Studio, with just MSBuild / Windows SDK, we have as of yet been unable to do so. We have heavy dependencies on MFC which hasn't, until recently, been available outside of Visual Studio. Plus we have had numerous technical issues migrating to the newer versions of the SDK / MSBuild that do include them. Regardless, again I'd be surprised if any of this made any difference unless the compiler that ships with the SDK is fundamentally architecturally different than the one that ships with VS. If I can spare some time to confirm a few of these details I'll let you know, even if just for curiosities sake.
            Hide
            sweavo Steve Carter added a comment -

            Solution 5: Don't run MsBuild projects in parallel.

            Before I built the python workaround, that's what I did using a throttling plugin. Works fine. pdbsrv gets killed at the end of each build, and started afresh by the microsoft toolchain on the next job. But if you are trying to do continuous build on development branches, then this won't have the capacity to keep up.

            Solution 6: Set BUILD_ID to hide pdbsrv from the processtreekiller. Live with the chance that once in a while pdbsrv might time out mid-build.

            Show
            sweavo Steve Carter added a comment - Solution 5: Don't run MsBuild projects in parallel. Before I built the python workaround, that's what I did using a throttling plugin. Works fine. pdbsrv gets killed at the end of each build, and started afresh by the microsoft toolchain on the next job. But if you are trying to do continuous build on development branches, then this won't have the capacity to keep up. Solution 6: Set BUILD_ID to hide pdbsrv from the processtreekiller. Live with the chance that once in a while pdbsrv might time out mid-build.
            Hide
            leedega Kevin Phillips added a comment -

            Solution 5: Don't run MsBuild projects in parallel.

            That may be fine for small projects but not for larger ones. For example, our main codebase is configured with about 40 jobs per configuration to build each "tier" or "module" in our codebase more efficiently - running jobs in parallel whenever possible. Doing so reduced our "clean" build times from 14 hours to 3. Numbers like that are hard to argue against.

            Are there other ways we could achieve similar results? Possibly, but they all require time and effort (aka: money) which we do not have.

            Solution 6: Set BUILD_ID to hide pdbsrv from the processtreekiller. Live with the chance that once in a while pdbsrv might time out mid-build.

            Could you clarify what you are referring to here? I assume you mean something other than using your python script since that was the very first "potential fix" I had mentioned above.

            It has been my experience that so long as you leave Visual Studio to it's own internal details to manage pdbsrv it works reliably for extended periods, keeping the service alive when needed and terminating it safely when it isn't, even if you run multiple builds in parallel via Jenkins. In fact that is what we do now and it never causes problems with our builds. This is saying something considering the size and scale of our build farm, with hundreds of jobs spread across nearly a dozen servers, all running 24/7!

            Show
            leedega Kevin Phillips added a comment - Solution 5: Don't run MsBuild projects in parallel. That may be fine for small projects but not for larger ones. For example, our main codebase is configured with about 40 jobs per configuration to build each "tier" or "module" in our codebase more efficiently - running jobs in parallel whenever possible. Doing so reduced our "clean" build times from 14 hours to 3. Numbers like that are hard to argue against. Are there other ways we could achieve similar results? Possibly, but they all require time and effort (aka: money) which we do not have. Solution 6: Set BUILD_ID to hide pdbsrv from the processtreekiller. Live with the chance that once in a while pdbsrv might time out mid-build. Could you clarify what you are referring to here? I assume you mean something other than using your python script since that was the very first "potential fix" I had mentioned above. It has been my experience that so long as you leave Visual Studio to it's own internal details to manage pdbsrv it works reliably for extended periods, keeping the service alive when needed and terminating it safely when it isn't, even if you run multiple builds in parallel via Jenkins. In fact that is what we do now and it never causes problems with our builds. This is saying something considering the size and scale of our build farm, with hundreds of jobs spread across nearly a dozen servers, all running 24/7!
            Hide
            danielbeck Daniel Beck added a comment -

            Maybe the following workaround would work: If mspdbsrv.exe runs as the user launching devenv, you could create a whole bunch of slaves all running on the same machine, but as different users, each having a single executor.

            Show
            danielbeck Daniel Beck added a comment - Maybe the following workaround would work: If mspdbsrv.exe runs as the user launching devenv, you could create a whole bunch of slaves all running on the same machine, but as different users, each having a single executor.
            Hide
            leedega Kevin Phillips added a comment -

            Seems a bit heavy. The extra overhead of running multiple agents alone seems like it would be significant, let alone the complexities involved with having multiple user profiles being used, all of which would need to have a consistent configuration to ensure the agents all behave the same, not to mention managing security and permissions and whatnot. Given that each of our agents currently runs with between 4 and 6 executors, that would increase our agent count by the same factor.

            Also, this would make managing overall load on a given system more complex. Consider jobs that are configured to use 100% of the agents resources to prevent parallel build problems, as an example. These would need to be configured to work across agents somehow. I'm not even sure that is possible....

            Show
            leedega Kevin Phillips added a comment - Seems a bit heavy. The extra overhead of running multiple agents alone seems like it would be significant, let alone the complexities involved with having multiple user profiles being used, all of which would need to have a consistent configuration to ensure the agents all behave the same, not to mention managing security and permissions and whatnot. Given that each of our agents currently runs with between 4 and 6 executors, that would increase our agent count by the same factor. Also, this would make managing overall load on a given system more complex. Consider jobs that are configured to use 100% of the agents resources to prevent parallel build problems, as an example. These would need to be configured to work across agents somehow. I'm not even sure that is possible....
            Hide
            zorbathut Ben Rog-Wilhelm added a comment -

            I looked into the difficulty of adding a "process whitelist" for processes that must not be killed. It would require some changes to winp but it's the only workable solution, besides "disable process killing for this entire task", which can, itself, cause build failures.

            Unfortunately, because the necessary changes have to span two projects, it'll be a bit of a large task without cooperation from everyone involved.

            > It has been my experience that so long as you leave Visual Studio to it's own internal details to manage pdbsrv it works reliably for extended periods, keeping the service alive when needed and terminating it safely when it isn't, even if you run multiple builds in parallel via Jenkins. In fact that is what we do now and it never causes problems with our builds. This is saying something considering the size and scale of our build farm, with hundreds of jobs spread across nearly a dozen servers, all running 24/7!

            Unfortunately I've found this isn't the case - there seem to be situations where mspdbsrv times out mid-build and is restarted cleanly, and if that doesn't happen within a BUILD_ID replacement block, then when the restarting build finishes, Jenkins will happily kill mspdbsrv and break other builds.

            I suspect "running 24/7" is why you're not seeing this - it's happening somewhat frequently on a much smaller farm of mine with much fewer jobs.

            Show
            zorbathut Ben Rog-Wilhelm added a comment - I looked into the difficulty of adding a "process whitelist" for processes that must not be killed. It would require some changes to winp but it's the only workable solution, besides "disable process killing for this entire task", which can, itself, cause build failures. Unfortunately, because the necessary changes have to span two projects, it'll be a bit of a large task without cooperation from everyone involved. > It has been my experience that so long as you leave Visual Studio to it's own internal details to manage pdbsrv it works reliably for extended periods, keeping the service alive when needed and terminating it safely when it isn't, even if you run multiple builds in parallel via Jenkins. In fact that is what we do now and it never causes problems with our builds. This is saying something considering the size and scale of our build farm, with hundreds of jobs spread across nearly a dozen servers, all running 24/7! Unfortunately I've found this isn't the case - there seem to be situations where mspdbsrv times out mid-build and is restarted cleanly, and if that doesn't happen within a BUILD_ID replacement block, then when the restarting build finishes, Jenkins will happily kill mspdbsrv and break other builds. I suspect "running 24/7" is why you're not seeing this - it's happening somewhat frequently on a much smaller farm of mine with much fewer jobs.
            Hide
            leedega Kevin Phillips added a comment -

            I suspect "running 24/7" is why you're not seeing this - it's happening somewhat frequently on a much smaller farm of mine with much fewer jobs.

            That is totally possible. Running so many jobs in parallel so often it is probably a rare condition that no jobs are running at all on any given server on our farm, and this may be preventing the service from timing out.

            Thanks for pointing that out.

            Show
            leedega Kevin Phillips added a comment - I suspect "running 24/7" is why you're not seeing this - it's happening somewhat frequently on a much smaller farm of mine with much fewer jobs. That is totally possible. Running so many jobs in parallel so often it is probably a rare condition that no jobs are running at all on any given server on our farm, and this may be preventing the service from timing out. Thanks for pointing that out.
            Hide
            ajomaa Tony Jomaa added a comment -

            I very new to the Jenkins world. I am running into this issue a lot. This would be a show stopper for us when it comes to adopting Jenkins for our build processes. Our builds get manually triggered by many users at random times. We could have 20 or more builds running at the same time; all running in parallel. I tried the Python script given by Steve Carter in a Execute Shell command box but I get an error about some "sh" -ex was not found! what gives? I thought I am running a Python script not Linux? or do they both need to run Linux?

            In short, if I do not get this resolved, we will have to go back to our previous way of building.
            Has anyone solved this issue yet?

            Thank you,

            Show
            ajomaa Tony Jomaa added a comment - I very new to the Jenkins world. I am running into this issue a lot. This would be a show stopper for us when it comes to adopting Jenkins for our build processes. Our builds get manually triggered by many users at random times. We could have 20 or more builds running at the same time; all running in parallel. I tried the Python script given by Steve Carter in a Execute Shell command box but I get an error about some "sh" -ex was not found! what gives? I thought I am running a Python script not Linux? or do they both need to run Linux? In short, if I do not get this resolved, we will have to go back to our previous way of building. Has anyone solved this issue yet? Thank you,
            Hide
            danielbeck Daniel Beck added a comment -

            Tony: Please address requests for assistance to the jenkinsci-users mailing list, or #jenkins IRC channel on Freenode.

            Show
            danielbeck Daniel Beck added a comment - Tony: Please address requests for assistance to the jenkinsci-users mailing list, or #jenkins IRC channel on Freenode.
            Hide
            kerrhome Shannon Kerr added a comment -

            I just ran into this one for the first time as far as I can tell. I did a quick look back and see no other instances and I don't recall seeing this before. For now, I'll take no action. Daniel Beck or anyone else, please let me know if I can provide you with any information that could help in resolving this. Build env where we saw this error: MS Win 7 x64, VS2010

            Show
            kerrhome Shannon Kerr added a comment - I just ran into this one for the first time as far as I can tell. I did a quick look back and see no other instances and I don't recall seeing this before. For now, I'll take no action. Daniel Beck or anyone else, please let me know if I can provide you with any information that could help in resolving this. Build env where we saw this error: MS Win 7 x64, VS2010
            Hide
            kerrhome Shannon Kerr added a comment -

            I hit three more instances of this. Two yesterday and one other a week ago.

            Show
            kerrhome Shannon Kerr added a comment - I hit three more instances of this. Two yesterday and one other a week ago.
            Hide
            danielbeck Daniel Beck added a comment -

            How are you starting these builds? Batch? MSBuild plugin? What exact commands? If batch, did you try setting BUILD_ID as described on https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller ?

            Show
            danielbeck Daniel Beck added a comment - How are you starting these builds? Batch? MSBuild plugin? What exact commands? If batch, did you try setting BUILD_ID as described on https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller ?
            Hide
            kerrhome Shannon Kerr added a comment -

            Batch. In the Jenkins project, we use "Execute Windows Batch Command" to call a batch script that automates a bunch of pre build work and ends up calling the builds via devenv.

            I did not try the BUILD_ID suggestion as I saw that there were still issues mentioned in this ticket with this work-around. I was trying hang in there until the final solution was provided, but the failures seem to be picking up for us lately. I guess we'll use this work around for now.

            Show
            kerrhome Shannon Kerr added a comment - Batch. In the Jenkins project, we use "Execute Windows Batch Command" to call a batch script that automates a bunch of pre build work and ends up calling the builds via devenv. I did not try the BUILD_ID suggestion as I saw that there were still issues mentioned in this ticket with this work-around. I was trying hang in there until the final solution was provided, but the failures seem to be picking up for us lately. I guess we'll use this work around for now.
            Hide
            kerrhome Shannon Kerr added a comment -

            I'm am trying the BUILD_ID suggestion now, but this is a hack (right?) and not a final solution? The final solution is to have Jenkins not kill specified jobs like mspdbsrv.exe. Whether that is in a whitelist managed by the user or hardcoded by Jenkins for now, doesn't matter to me. Hopefully there will be a long-term fix to Jenkins for this.

            Show
            kerrhome Shannon Kerr added a comment - I'm am trying the BUILD_ID suggestion now, but this is a hack (right?) and not a final solution? The final solution is to have Jenkins not kill specified jobs like mspdbsrv.exe. Whether that is in a whitelist managed by the user or hardcoded by Jenkins for now, doesn't matter to me. Hopefully there will be a long-term fix to Jenkins for this.
            Hide
            delboyjay Del Hyman-Jones added a comment - - edited

            Does anyone know if there is an option to stop Jenkins from killing processes completely as a global option instead of having to add the BUILD_ID to every single job? I have tried adding this as an env variable at the node level but it doesn't appear to give the desired results (same PDB errors were still occurring), maybe I'm doing something wrong or misunderstanding how this is working under the hood?

            We were running CruiseControl for years and never had this problem but we did however have issues where processes were not terminating properly and builds would run forever until someone intervened. Sometimes we still get this with Jenkins so from my point of view one problem is better than two so I'd rather just have an option to tell Jenkins not to force terminate anything - ever. If this cannot be done with the current version (ours is 1.566) Can we at least add a check box that says "Do not auto-terminate processes" as an option in a future release and let the user decide?

            Show
            delboyjay Del Hyman-Jones added a comment - - edited Does anyone know if there is an option to stop Jenkins from killing processes completely as a global option instead of having to add the BUILD_ID to every single job? I have tried adding this as an env variable at the node level but it doesn't appear to give the desired results (same PDB errors were still occurring), maybe I'm doing something wrong or misunderstanding how this is working under the hood? We were running CruiseControl for years and never had this problem but we did however have issues where processes were not terminating properly and builds would run forever until someone intervened. Sometimes we still get this with Jenkins so from my point of view one problem is better than two so I'd rather just have an option to tell Jenkins not to force terminate anything - ever. If this cannot be done with the current version (ours is 1.566) Can we at least add a check box that says "Do not auto-terminate processes" as an option in a future release and let the user decide?
            Hide
            kerrhome Shannon Kerr added a comment -

            How are you trying to run this, Del? At first, I didn't have success getting it going, but now I seem to have it working fine. The BUILD_ID does seem to be an effective solution (I do worry about the memory leak though). I'm using the simple batch solution in comment 6, not the python solution. You just have to make sure that the mspdbsrv file is in your path and it should work fine. We use a batch wrapper, which is under version control, for our builds and I added code that says "If this is a Jenkins build, execute this block". To decide if this if a Jenkins build, I just check to see if JENKINS_URL is defined. Since I added that, we've not seen this issue return. Let me know if I can help in some way.

            Show
            kerrhome Shannon Kerr added a comment - How are you trying to run this, Del? At first, I didn't have success getting it going, but now I seem to have it working fine. The BUILD_ID does seem to be an effective solution (I do worry about the memory leak though). I'm using the simple batch solution in comment 6, not the python solution. You just have to make sure that the mspdbsrv file is in your path and it should work fine. We use a batch wrapper, which is under version control, for our builds and I added code that says "If this is a Jenkins build, execute this block". To decide if this if a Jenkins build, I just check to see if JENKINS_URL is defined. Since I added that, we've not seen this issue return. Let me know if I can help in some way.
            Hide
            delboyjay Del Hyman-Jones added a comment - - edited

            I've added the block from above into the Jenkins command for the job at the moment but yesterday I got this error and there was only one build running so it is likely a different issue.

            33>X509Helper.h(118): fatal error C1090: PDB API call failed, error code '23' : '(

            I've even tried setting BUILD_ID=dontKillMe under the node configuration in Environment variables but I have been getting the original problem with that setting also. I even tried restarting the jenkins client service on the build server just in case it was needed for the env variable to be set for all child processes but it's not helping it seems. If this is working for yourself (@Shannon) I have to be doing something stupid.

            It seems that putting BUILD_ID under the node settings will be overridden when the build runds and will set BUILD_ID back to the build time. Which rules out having a global setting allowing me to turn this off.

            Show
            delboyjay Del Hyman-Jones added a comment - - edited I've added the block from above into the Jenkins command for the job at the moment but yesterday I got this error and there was only one build running so it is likely a different issue. 33>X509Helper.h(118): fatal error C1090: PDB API call failed, error code '23' : '( I've even tried setting BUILD_ID=dontKillMe under the node configuration in Environment variables but I have been getting the original problem with that setting also. I even tried restarting the jenkins client service on the build server just in case it was needed for the env variable to be set for all child processes but it's not helping it seems. If this is working for yourself (@Shannon) I have to be doing something stupid. It seems that putting BUILD_ID under the node settings will be overridden when the build runds and will set BUILD_ID back to the build time. Which rules out having a global setting allowing me to turn this off.
            Hide
            leedega Kevin Phillips added a comment -

            One thing I felt needed to be expressed here is that the fact that this defect arose in an update to the LTS edition at all worries me. Combined with the fact that this defect has been opened and under active discussion for months now without any 'real' resolution - other than some hacks and workarounds - is even more concerning. According to the Jenkins website LTS editions should "...change(s) less often and only for important bug fixes...". This policy seems to have been completely negated here. Given the severity / impact of this change I would have expected whatever "improvement" was made that caused this problem would have been reserved for the "latest" release, or at the very least reverted from the LTS edition after this problem was discovered.

            Perhaps someone with more knowledge about the cause of this error could elaborate on why neither of these approaches has been taken here.

            Show
            leedega Kevin Phillips added a comment - One thing I felt needed to be expressed here is that the fact that this defect arose in an update to the LTS edition at all worries me. Combined with the fact that this defect has been opened and under active discussion for months now without any 'real' resolution - other than some hacks and workarounds - is even more concerning. According to the Jenkins website LTS editions should "...change(s) less often and only for important bug fixes...". This policy seems to have been completely negated here. Given the severity / impact of this change I would have expected whatever "improvement" was made that caused this problem would have been reserved for the "latest" release, or at the very least reverted from the LTS edition after this problem was discovered. Perhaps someone with more knowledge about the cause of this error could elaborate on why neither of these approaches has been taken here.
            Hide
            kerrhome Shannon Kerr added a comment -

            @Del, Yes, you cannot set BUILD_ID for a slave setting. It is set by Jenkins on a per build basis. You'd either have to set it in the batch section of the job itself (we did this for our most frequently used builds) or if you call a batch script or some other script, you can put it there.

            Show
            kerrhome Shannon Kerr added a comment - @Del, Yes, you cannot set BUILD_ID for a slave setting. It is set by Jenkins on a per build basis. You'd either have to set it in the batch section of the job itself (we did this for our most frequently used builds) or if you call a batch script or some other script, you can put it there.
            Show
            ki82 Christian Bremer added a comment - 200$ is up for grabs for solving this issue at: https://freedomsponsors.org/issue/596/visual-studio-builds-started-by-jenkins-fail-with-fatal-error-c1090-because-mspdbsrvexe-gets-killed
            Hide
            danielweber Daniel Weber added a comment -

            I implemented a whitelist solution, see pull request: https://github.com/jenkinsci/jenkins/pull/1562

            Show
            danielweber Daniel Weber added a comment - I implemented a whitelist solution, see pull request: https://github.com/jenkinsci/jenkins/pull/1562
            Hide
            sweavo Steve Carter added a comment - - edited

            Nice work Daniel. Will be interesting to see whether that solves the problem.

            For the good of the thread, I'm going to try to summarize this from the top down as there's a lot of talk on here that seems to miss the key points.

            1) BUILD_ID is an environment variable, set by Jenkins when it starts a job.

            2) Environment variables are inherited when processes start other processes, except when overwritten. For e.g. in bash scripts you can go

            MYVAR=myvalue myscript.sh

            and myscript.sh will run with MYVAR set to myvalue.

            3) Therefore, all processes started by a jenkins job have the same BUILD_ID. This is recursive.

            4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them.

            This is correct and good behavior by Jenkins.

            5) When you start an MSBUILD job, pdbsrv is started, which catches requests from parallel compilations and serializes them to write pdb files. When started from Jenkins, that pbdsrv process inherits BUILD_ID from the job.

            6) If you run two MSBUILD builds at once, then they share the same pdbsrv process.

            7) When the first job ends, it kills the pdbsrv process – because its BUILD_ID matches the first job's build id. The second job then fails.

            8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job.

            9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job.

            Casual readers stop here.
            =========================

            10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away.

            11) My "heavyweight" python fix is trying to deal with that. Basically wrapping pdbsrv with a proper timeout and reference counting so that pdbsrv is present exactly when needed.

            12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv.

            13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv.

            I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution. The use of BUILD_ID brings the Jenkins machine under better control against rogue processes, and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist.

            14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build.

            =========================

            Solution 0: Finally, it would be remiss of me not to mention again my python workaround, which has been happily keeping parallel builds working for 54 weeks now without trouble.

            Show
            sweavo Steve Carter added a comment - - edited Nice work Daniel. Will be interesting to see whether that solves the problem. For the good of the thread, I'm going to try to summarize this from the top down as there's a lot of talk on here that seems to miss the key points. 1) BUILD_ID is an environment variable, set by Jenkins when it starts a job. 2) Environment variables are inherited when processes start other processes, except when overwritten. For e.g. in bash scripts you can go MYVAR=myvalue myscript.sh and myscript.sh will run with MYVAR set to myvalue. 3) Therefore, all processes started by a jenkins job have the same BUILD_ID. This is recursive. 4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them. This is correct and good behavior by Jenkins. 5) When you start an MSBUILD job, pdbsrv is started, which catches requests from parallel compilations and serializes them to write pdb files. When started from Jenkins, that pbdsrv process inherits BUILD_ID from the job. 6) If you run two MSBUILD builds at once, then they share the same pdbsrv process. 7) When the first job ends, it kills the pdbsrv process – because its BUILD_ID matches the first job's build id. The second job then fails. 8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job. 9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job. Casual readers stop here. ========================= 10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away. 11) My "heavyweight" python fix is trying to deal with that. Basically wrapping pdbsrv with a proper timeout and reference counting so that pdbsrv is present exactly when needed. 12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv. 13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv. I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution. The use of BUILD_ID brings the Jenkins machine under better control against rogue processes, and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist. 14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build. ========================= Solution 0: Finally, it would be remiss of me not to mention again my python workaround, which has been happily keeping parallel builds working for 54 weeks now without trouble.
            Hide
            sweavo Steve Carter added a comment -

            penny drops just seen how whitelisting differs from BUILD_ID solution subtle, but it might just work...

            Show
            sweavo Steve Carter added a comment - penny drops just seen how whitelisting differs from BUILD_ID solution subtle, but it might just work...
            Hide
            leedega Kevin Phillips added a comment -

            Just a quick ping-back on this issue. Outstanding for like 4 years, no comments for months now, and all for a debilitating, crippling problem in the system! I did notice the pull request Daniel Webber created, which does seem to have some more recent activity on it but still no complete resolution to the issue even in the latest LTS release.

            Are there plans for finishing this work any time soon? We are still stuck on an LTS version from like a year or two ago because we can not accept this bug into our production environment. If there is any way to get this fix in sooner rather than later I know I'd appreciate it and I'm sure many others would as well.

            Show
            leedega Kevin Phillips added a comment - Just a quick ping-back on this issue. Outstanding for like 4 years, no comments for months now, and all for a debilitating, crippling problem in the system! I did notice the pull request Daniel Webber created, which does seem to have some more recent activity on it but still no complete resolution to the issue even in the latest LTS release. Are there plans for finishing this work any time soon? We are still stuck on an LTS version from like a year or two ago because we can not accept this bug into our production environment. If there is any way to get this fix in sooner rather than later I know I'd appreciate it and I'm sure many others would as well.
            Hide
            leedega Kevin Phillips added a comment -

            @steve carter
            First, let me thank you for summarizing the earlier comment threads. That does help bring everything into focus.

            4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them. This is correct and good behavior by Jenkins.

            Agreed. This is a perfectly valid and useful enhancement for the majority of cases. However, given the debilitating effect it has on this specific use case combined with the fact that the change was included on an LTS release which is expected to be kept as stable as possible is where I take issue. I see this problem as a bug, albeit a difficult to detect bug and admittedly a bug that is really caused by some questionable behavior provided by the Microsoft build tools, but a bug none the less. In that case critical, production halt kind of bugs like this should be fixed immediately or reverted until an appropriate fix can be made. Doing otherwise reduces users' confidence in the stability of the tool. There is a reason shops like ours choose to use LTS editions for production work - to avoid problems like this that may be found on the latest, cutting edge versions.

            8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job.

            This should be called a workaround or hack rather than a solution. That point aside, this workaround again won't work for our particular build environment. We use the BUILD_ID throughout our build processes to embed metadata in the binary files we generate. If we reset that environment variable as part of our build this metadata will essentially get corrupted. Changing our tooling to use an alternative environment variable would require significant effort as well, having to be propagated out to dozens of products across several release branches each.

            9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job.

            Based on my review of his pull request, Daniel's feature has not yet been completed nor has it been included in any actual LTS release. I do believe this would be a reasonable and appropriate solution to this defect though, so hopefully this work can be completed sooner rather than later.

            10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away.

            I know some earlier posters did indicate that this was an issue for them I have not been able to reproduce the problem as described. When a compile begins and this process is running it makes use of the existing process, and if the process is not already running it starts it. I have never had a compile running and seen the mspdbsrv process terminate mid-compile without any other background process or system event occurring. Also, I work with many development teams including many dozens of developers and have never once had a report of this bug outside of the reproducible use cases I've stated before.

            Conversely, I have shown the problem is reproducible outside of Jenkins in very hard to detect ways which I suspect may appear to some to be an intermittent timeout. For example, if you are logged in to a system which is performing a compile in a background process which is also running under the same user profile as your local session, by simply logging out of the system the service terminates. The reason for this is the pdbsrv process is shared by the background process and your local user session and when you log out from the local session all processes in that memory space are terminated, including pdbsrv. This was a very difficult use case to isolate and not very obvious to users of the target systems and even went undiagnosed at my place of work for months under the assumption that the failure was unpredictable and intermittent.

            I know that my argument doesn't prove that this particular problem couldn't ever happen but I am extremely skeptical to say the least. If someone does believe that this problem does in fact exist I would greatly appreciate a detailed description on how to reproduce the problem. Maybe we're using a slightly older or slightly newer version of the compiler that doesn't exhibit the problem or something. Either way, if these individuals were willing to compare notes maybe we can help further isolate the root of this discrepancy.

            12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv.

            As I've stated in earlier posts, my team manages a build farm with close to a dozen agents now, running over 1000 build jobs and never once have I ever had this error occur on any of those systems, nor have any of the development teams we support report this problem on any of their local development machines. I would have to say that if this were in fact a core issue with the Microsoft toolset we would have discovered it by now. Again, if anyone can give me a reproducible use case that proves otherwise I would be happy to hear from them. Maybe we are doing something they aren't, or vice versa.

            13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv.

            Again, this is something we have not been able to reproduce. For example, I have watches some of our agents that are under the most considerable load wrt build operations - machines which essentially run 24/7 compiling one or more projects in parallel nearly all the time and these systems continue to run stably day after day, week after week without requiring any outside intervention from me or my team. The pdbsrv process is nearly always active, the memory consumption increases and decreases with the load on the machines, and never causes any fatal errors in our build processes.

            If anyone can provide specific, reproducible criteria for this problem I would be interested to hear it. If there is something we have overlooked that may be causing us grief elsewhere that we have not yet considered I would definitely want to know about it.

            I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution.

            Agreed. I don't think 'just' rolling back this change is the best solution. I think fixing this bug is the best solution. However in the absence of an appropriate fix for this bug, combined with the severity of it's impact, I think that rolling back the change until an appropriate fix was put in place would have been a better solution rather than stranding users of your tool on an old, out of date release as we have been.

            Just my 2 cents.

            The use of BUILD_ID brings the Jenkins machine under better control against rogue processes...

            Totally agree that the improvement is well worth the effort. My concern is that the change includes a relatively significant bug.

            ...and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist.

            Again, 'easy' workaround is a relative term. As just mentioned we would need to rework our build tools and roll that change out to many teams for many products, and backport those changes to many branches for this to work, after which we'd need to going through all 1000+ jobs on our farm and update them with the hack to the environment variable. Obviously significant effort in our case. Also the whitelist solution has yet to be completed from what I can tell, so that is not a usable solution yet.

            14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build.

            Again, just to be clear this is clearly a workaround and not a solution.

            This hack may work for us in the interim until an appropriate fix can be made. I will test it out as soon as I can and report back. In our case we'll likely just setup a scheduled task that runs on boot and forces the service to start, and stay running indefinitely as there is no need for it to shut down ever that we have seen.

            However, for those individuals who claim that the service does need periodic resetting a solution like this would likely be more complex. Assuming they to need to ensure the utmost stability of their build farm as we do, they would need to ensure the pdbsrv service gets started before any compilation operation runs, including after reboots, power outages, crashes and the like. I don't believe there is any way to achieve this using a Jenkins operation. This means an external process would be needed like the Scheduled Task idea I just mentioned. But then the external process would be running independently from the Jenkins agent making it even more difficult to coordinate the two. For example, I suspect it would be difficult at best to make sure the scheduled task restarts the service at an opportune moment when no compilation operations are happening on the agent. Just something else for those users to keep in mind.

            Show
            leedega Kevin Phillips added a comment - @steve carter First, let me thank you for summarizing the earlier comment threads. That does help bring everything into focus. 4) Jenkins, in order to catch rogue processes at job end (i.e. those that have broken ties with their parent process) scans the whole process space for those with the particular BUILD_ID in their environment, and kills them. This is correct and good behavior by Jenkins. Agreed. This is a perfectly valid and useful enhancement for the majority of cases. However, given the debilitating effect it has on this specific use case combined with the fact that the change was included on an LTS release which is expected to be kept as stable as possible is where I take issue. I see this problem as a bug, albeit a difficult to detect bug and admittedly a bug that is really caused by some questionable behavior provided by the Microsoft build tools, but a bug none the less. In that case critical, production halt kind of bugs like this should be fixed immediately or reverted until an appropriate fix can be made. Doing otherwise reduces users' confidence in the stability of the tool. There is a reason shops like ours choose to use LTS editions for production work - to avoid problems like this that may be found on the latest, cutting edge versions. 8) Solution 1: start pdbsrv with a BUILD_ID that doesn't match the build jobs. Then pdbsrv will not be killed at the end of the job. This should be called a workaround or hack rather than a solution. That point aside, this workaround again won't work for our particular build environment. We use the BUILD_ID throughout our build processes to embed metadata in the binary files we generate. If we reset that environment variable as part of our build this metadata will essentially get corrupted. Changing our tooling to use an alternative environment variable would require significant effort as well, having to be propagated out to dozens of products across several release branches each. 9) Solution 2: use Daniel's whitelist feature to not kill pdbsrv at the end of the job. Based on my review of his pull request, Daniel's feature has not yet been completed nor has it been included in any actual LTS release. I do believe this would be a reasonable and appropriate solution to this defect though, so hopefully this work can be completed sooner rather than later. 10) The problem with Solutions 1 and 2 are this: pdbsrv still has a timeout, so you will get sporadic failures when the server goes away. I know some earlier posters did indicate that this was an issue for them I have not been able to reproduce the problem as described. When a compile begins and this process is running it makes use of the existing process, and if the process is not already running it starts it. I have never had a compile running and seen the mspdbsrv process terminate mid-compile without any other background process or system event occurring. Also, I work with many development teams including many dozens of developers and have never once had a report of this bug outside of the reproducible use cases I've stated before. Conversely, I have shown the problem is reproducible outside of Jenkins in very hard to detect ways which I suspect may appear to some to be an intermittent timeout. For example, if you are logged in to a system which is performing a compile in a background process which is also running under the same user profile as your local session, by simply logging out of the system the service terminates. The reason for this is the pdbsrv process is shared by the background process and your local user session and when you log out from the local session all processes in that memory space are terminated, including pdbsrv. This was a very difficult use case to isolate and not very obvious to users of the target systems and even went undiagnosed at my place of work for months under the assumption that the failure was unpredictable and intermittent. I know that my argument doesn't prove that this particular problem couldn't ever happen but I am extremely skeptical to say the least. If someone does believe that this problem does in fact exist I would greatly appreciate a detailed description on how to reproduce the problem. Maybe we're using a slightly older or slightly newer version of the compiler that doesn't exhibit the problem or something. Either way, if these individuals were willing to compare notes maybe we can help further isolate the root of this discrepancy. 12) pdbsrv's timeout doesn't get a new lease every time you use pdbsrv. I regard this as a bug in pdbsrv. As I've stated in earlier posts, my team manages a build farm with close to a dozen agents now, running over 1000 build jobs and never once have I ever had this error occur on any of those systems, nor have any of the development teams we support report this problem on any of their local development machines. I would have to say that if this were in fact a core issue with the Microsoft toolset we would have discovered it by now. Again, if anyone can give me a reproducible use case that proves otherwise I would be happy to hear from them. Maybe we are doing something they aren't, or vice versa. 13) You can't leave pdbsrv running forever because it (allegedly) has memory leaks. I regard this as a bug in pdbsrv. Again, this is something we have not been able to reproduce. For example, I have watches some of our agents that are under the most considerable load wrt build operations - machines which essentially run 24/7 compiling one or more projects in parallel nearly all the time and these systems continue to run stably day after day, week after week without requiring any outside intervention from me or my team. The pdbsrv process is nearly always active, the memory consumption increases and decreases with the load on the machines, and never causes any fatal errors in our build processes. If anyone can provide specific, reproducible criteria for this problem I would be interested to hear it. If there is something we have overlooked that may be causing us grief elsewhere that we have not yet considered I would definitely want to know about it. I really think to roll back Jenkins' ProcessTreeKiller is NOT a solution. Agreed. I don't think 'just' rolling back this change is the best solution. I think fixing this bug is the best solution. However in the absence of an appropriate fix for this bug, combined with the severity of it's impact, I think that rolling back the change until an appropriate fix was put in place would have been a better solution rather than stranding users of your tool on an old, out of date release as we have been. Just my 2 cents. The use of BUILD_ID brings the Jenkins machine under better control against rogue processes... Totally agree that the improvement is well worth the effort. My concern is that the change includes a relatively significant bug. ...and the workaround (for well-behaved servers) is easy, set BUILD_ID before starting the server, or use Daniel's whitelist. Again, 'easy' workaround is a relative term. As just mentioned we would need to rework our build tools and roll that change out to many teams for many products, and backport those changes to many branches for this to work, after which we'd need to going through all 1000+ jobs on our farm and update them with the hack to the environment variable. Obviously significant effort in our case. Also the whitelist solution has yet to be completed from what I can tell, so that is not a usable solution yet. 14) Solution 3: start pdbsrv periodically, e.g. every day with a day-long timeout. That will mitigate against the memory leaks. If you use some concurrency control, e.g. Job Weight plugin, you can make sure this "kill and restart pdbsrv" job does not fire during a build. Again, just to be clear this is clearly a workaround and not a solution. This hack may work for us in the interim until an appropriate fix can be made. I will test it out as soon as I can and report back. In our case we'll likely just setup a scheduled task that runs on boot and forces the service to start, and stay running indefinitely as there is no need for it to shut down ever that we have seen. However, for those individuals who claim that the service does need periodic resetting a solution like this would likely be more complex. Assuming they to need to ensure the utmost stability of their build farm as we do, they would need to ensure the pdbsrv service gets started before any compilation operation runs, including after reboots, power outages, crashes and the like. I don't believe there is any way to achieve this using a Jenkins operation. This means an external process would be needed like the Scheduled Task idea I just mentioned. But then the external process would be running independently from the Jenkins agent making it even more difficult to coordinate the two. For example, I suspect it would be difficult at best to make sure the scheduled task restarts the service at an opportune moment when no compilation operations are happening on the agent. Just something else for those users to keep in mind.
            Hide
            leedega Kevin Phillips added a comment -

            PS: Sorry for the rant. My team and I have been aggravated for some time now, hoping this bug would be fixed so we can move off the old version of Jenkins we're currently stuck on and thus able to pick up some new bug fixes both in the core as well as in numerous plugins which only support newer versions. Hopefully I don't come across as overly adversarial.

            Show
            leedega Kevin Phillips added a comment - PS: Sorry for the rant. My team and I have been aggravated for some time now, hoping this bug would be fixed so we can move off the old version of Jenkins we're currently stuck on and thus able to pick up some new bug fixes both in the core as well as in numerous plugins which only support newer versions. Hopefully I don't come across as overly adversarial.
            Hide
            laro Lars Rosenboom added a comment - - edited

            Maybe there is a way to shut down the mspdbsrv.exe softly, so it stops only after all active request (by parallel builds) are done. Then it should simply restart on the next request.

            Another solution would be to allow the user to give a list of process names not to kill (or maybe hardcode not to kill mspdbsrv.exe).

            Show
            laro Lars Rosenboom added a comment - - edited Maybe there is a way to shut down the mspdbsrv.exe softly, so it stops only after all active request (by parallel builds) are done. Then it should simply restart on the next request. Another solution would be to allow the user to give a list of process names not to kill (or maybe hardcode not to kill mspdbsrv.exe).
            Hide
            s7726 Gavin Swanson added a comment -

            Stopping after a timeout period after all active requests and continuing to run when it gets a new request are the way mspdbsrv runs normally when something doesn't go around killing it (ala Jenkins).

            I believe the correct solution is a whitelist.

            Show
            s7726 Gavin Swanson added a comment - Stopping after a timeout period after all active requests and continuing to run when it gets a new request are the way mspdbsrv runs normally when something doesn't go around killing it (ala Jenkins). I believe the correct solution is a whitelist.
            Hide
            leedega Kevin Phillips added a comment -

            Update
            So, it turns out setting up some kind of background process to spawn a copy of the pdbsrv process isn't going to work as expected. From what I can tell Windows seems to be able to tell when a process has been launched from a system service and it will prevent those sub-processes from using other processes that are spawned elsewhere. The particulars of my test case are as follows:

            1. Setup a small Python script that launches a copy of mspdbsrv.exe when called
            2. Setup a scheduled task in Windows to run the python script on boot
            3. Reboot the agent - confirm the mspdbsrv.exe process is running
            4. trigger a compilation operation via the Jenkins dashboard
            5. A new, secondary copy of mspdbsrv.exe is spawned to serve the Jenkins agent. This sub-process is then terminated as per usual once the Jenkins build is complete.

            I have confirmed that both the service that runs the Jenkins agent and the scheduled task use the same user profile and credentials and that both environments are using the same version of mspdbsrv.exe with the same set of command line parameters (ie: -start -spawn).

            Looks like I have to head back to the drawing board.

            Show
            leedega Kevin Phillips added a comment - Update So, it turns out setting up some kind of background process to spawn a copy of the pdbsrv process isn't going to work as expected. From what I can tell Windows seems to be able to tell when a process has been launched from a system service and it will prevent those sub-processes from using other processes that are spawned elsewhere. The particulars of my test case are as follows: Setup a small Python script that launches a copy of mspdbsrv.exe when called Setup a scheduled task in Windows to run the python script on boot Reboot the agent - confirm the mspdbsrv.exe process is running trigger a compilation operation via the Jenkins dashboard A new, secondary copy of mspdbsrv.exe is spawned to serve the Jenkins agent. This sub-process is then terminated as per usual once the Jenkins build is complete. I have confirmed that both the service that runs the Jenkins agent and the scheduled task use the same user profile and credentials and that both environments are using the same version of mspdbsrv.exe with the same set of command line parameters (ie: -start -spawn). Looks like I have to head back to the drawing board.
            Hide
            leedega Kevin Phillips added a comment -

            Update
            As a quick sanity check I decided to throw together a quick ad-hoc test configuration where by I overload the BUILD_ID in the environment for one of my compilation jobs just to see if one of the hacks proposed earlier will potentially work. Unfortunately it looks like this is not a robust solution either. I have confirmed in the trivial case that the solution does work, as in:

            1. Setup a job with a single shell operation as a build step, configured as follows:
              • override the BUILD_ID env var with some arbitrary value
              • call into MSBuild to perform the compilation
            2. run a build of the given job
            3. upon completion, confirm that the mspdbsrv.exe process is still running - TEST SUCCESSFUL

            However, unfortunately I've found another case where this solution doesn't work. Apparently if you manually kill the build while it is running Jenkins still somehow manages to locate the orphaned pdbsrv process and kill it, despite the changes described above. So, to put it more clearly:

            1. Setup a job with a single shell operation as a build step, configured as follows:
              • override the BUILD_ID env var with some arbitrary value
              • call into MSBuild to perform the compilation
            2. run a build of the given job
            3. while the compilation operation is running, and you have confirmed the mspdbsrv.exe process has been launched, manually force the running build to terminate (ie: by clicking on the X icon next to the running build on the Jenkins dashboard)
            4. FAILURE - Jenkins still terminates the pdbsrv process

            I have confirmed that the pdbsrv process does correctly inherit the overloaded BUILD_ID, so Jenkins is somehow able to locate and terminate the process in this case. I suspect what may be happening in my test env is that at the point at which I manually kill the build Jenkins is still running one or more Visual Studio operations which have a direct link to the mspdbsrv.exe process and thus it detects and kills the thread by recursively transcending the process tree killing all running processes / threads that are tied to the agent at the time.

            Either way, this example shows that even this 'hack' of overriding the BUILD_ID is fragile at best. It looks like we may have no choice but to wait for a fix for that 'whitelist' solution before we can consider upgrading our Jenkins instance.

            Show
            leedega Kevin Phillips added a comment - Update As a quick sanity check I decided to throw together a quick ad-hoc test configuration where by I overload the BUILD_ID in the environment for one of my compilation jobs just to see if one of the hacks proposed earlier will potentially work. Unfortunately it looks like this is not a robust solution either. I have confirmed in the trivial case that the solution does work, as in: Setup a job with a single shell operation as a build step, configured as follows: override the BUILD_ID env var with some arbitrary value call into MSBuild to perform the compilation run a build of the given job upon completion, confirm that the mspdbsrv.exe process is still running - TEST SUCCESSFUL However, unfortunately I've found another case where this solution doesn't work. Apparently if you manually kill the build while it is running Jenkins still somehow manages to locate the orphaned pdbsrv process and kill it, despite the changes described above. So, to put it more clearly: Setup a job with a single shell operation as a build step, configured as follows: override the BUILD_ID env var with some arbitrary value call into MSBuild to perform the compilation run a build of the given job while the compilation operation is running, and you have confirmed the mspdbsrv.exe process has been launched, manually force the running build to terminate (ie: by clicking on the X icon next to the running build on the Jenkins dashboard) FAILURE - Jenkins still terminates the pdbsrv process I have confirmed that the pdbsrv process does correctly inherit the overloaded BUILD_ID, so Jenkins is somehow able to locate and terminate the process in this case. I suspect what may be happening in my test env is that at the point at which I manually kill the build Jenkins is still running one or more Visual Studio operations which have a direct link to the mspdbsrv.exe process and thus it detects and kills the thread by recursively transcending the process tree killing all running processes / threads that are tied to the agent at the time. Either way, this example shows that even this 'hack' of overriding the BUILD_ID is fragile at best. It looks like we may have no choice but to wait for a fix for that 'whitelist' solution before we can consider upgrading our Jenkins instance.
            Hide
            leedega Kevin Phillips added a comment -

            Update
            While reporting the issue in my last comment I had the idea for a slight variation of the configuration described there which does appear to work in both use cases. The main modification that I made was to separate the build operation into two separate build operations:

            • the first is a simple Windows command line call which overrides BUILD_ID and then launches mspdbsrv.exe. Once this first operation completes, Jenkins terminates the shell session that is linked to the pdbsrv process thus decoupling it from the agent. Combined with the overloaded BUILD_ID env var, Jenkins can no longer track the process.
            • the second operation is just another instance of a Windows shell session that then calls into msbuild to proceed with the build.

            Theoretically even this solution "could" fall prey to the same problem I described in my previous comment, however the execution time of this initial build step is negligible and is highly unlikely to be exploited in practice (ie: a user would need to hit the kill button on the build at just that small fraction of a second it takes Jenkins to launch mspdbsrv.exe).

            I'm not sure how easy this hack will be for us to roll out into production at the scale we need, but just in case others find this tidbit of information helpful I thought I'd provide it here.

            Show
            leedega Kevin Phillips added a comment - Update While reporting the issue in my last comment I had the idea for a slight variation of the configuration described there which does appear to work in both use cases. The main modification that I made was to separate the build operation into two separate build operations: the first is a simple Windows command line call which overrides BUILD_ID and then launches mspdbsrv.exe. Once this first operation completes, Jenkins terminates the shell session that is linked to the pdbsrv process thus decoupling it from the agent. Combined with the overloaded BUILD_ID env var, Jenkins can no longer track the process. the second operation is just another instance of a Windows shell session that then calls into msbuild to proceed with the build. Theoretically even this solution "could" fall prey to the same problem I described in my previous comment, however the execution time of this initial build step is negligible and is highly unlikely to be exploited in practice (ie: a user would need to hit the kill button on the build at just that small fraction of a second it takes Jenkins to launch mspdbsrv.exe). I'm not sure how easy this hack will be for us to roll out into production at the scale we need, but just in case others find this tidbit of information helpful I thought I'd provide it here.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Daniel Weber
            Path:
            core/src/main/java/hudson/util/ProcessKillingVeto.java
            core/src/main/java/hudson/util/ProcessTree.java
            test/src/test/java/hudson/util/ProcessTreeKillerTest.java
            http://jenkins-ci.org/commit/jenkins/a220431770cfe716e4f69fd76a4a59bbb27aa045
            Log:
            JENKINS-9104 Add ProcessKillingVeto extension point

            This allows extensions to veto killing of certain processes.

            Issue 9104 is not yet solved by this, it is only part of the solution. The
            rest should be taken care of in plugins.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Weber Path: core/src/main/java/hudson/util/ProcessKillingVeto.java core/src/main/java/hudson/util/ProcessTree.java test/src/test/java/hudson/util/ProcessTreeKillerTest.java http://jenkins-ci.org/commit/jenkins/a220431770cfe716e4f69fd76a4a59bbb27aa045 Log: JENKINS-9104 Add ProcessKillingVeto extension point This allows extensions to veto killing of certain processes. Issue 9104 is not yet solved by this, it is only part of the solution. The rest should be taken care of in plugins.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Daniel Beck
            Path:
            core/src/main/java/hudson/util/ProcessKillingVeto.java
            core/src/main/java/hudson/util/ProcessTree.java
            test/src/test/java/hudson/util/ProcessTreeKillerTest.java
            http://jenkins-ci.org/commit/jenkins/9a047acd4b5a4e805cee7260f3d091405dc7b930
            Log:
            Merge pull request #1684 from DanielWeber/JENKINS-9104

            JENKINS-9104 Add extension point that allows extensions to veto killing...

            Compare: https://github.com/jenkinsci/jenkins/compare/3c785d5af0ad...9a047acd4b5a

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: core/src/main/java/hudson/util/ProcessKillingVeto.java core/src/main/java/hudson/util/ProcessTree.java test/src/test/java/hudson/util/ProcessTreeKillerTest.java http://jenkins-ci.org/commit/jenkins/9a047acd4b5a4e805cee7260f3d091405dc7b930 Log: Merge pull request #1684 from DanielWeber/ JENKINS-9104 JENKINS-9104 Add extension point that allows extensions to veto killing... Compare: https://github.com/jenkinsci/jenkins/compare/3c785d5af0ad...9a047acd4b5a
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4205
            JENKINS-9104 Add ProcessKillingVeto extension point (Revision a220431770cfe716e4f69fd76a4a59bbb27aa045)

            Result = UNSTABLE
            daniel.weber.dev : a220431770cfe716e4f69fd76a4a59bbb27aa045
            Files :

            • core/src/main/java/hudson/util/ProcessKillingVeto.java
            • core/src/main/java/hudson/util/ProcessTree.java
            • test/src/test/java/hudson/util/ProcessTreeKillerTest.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4205 JENKINS-9104 Add ProcessKillingVeto extension point (Revision a220431770cfe716e4f69fd76a4a59bbb27aa045) Result = UNSTABLE daniel.weber.dev : a220431770cfe716e4f69fd76a4a59bbb27aa045 Files : core/src/main/java/hudson/util/ProcessKillingVeto.java core/src/main/java/hudson/util/ProcessTree.java test/src/test/java/hudson/util/ProcessTreeKillerTest.java
            Hide
            mifoe MiFoe added a comment -

            When you use the commandline switch /Z7 the debug info is stored in the object and no server process is needed. This should also solve the problem.

            Show
            mifoe MiFoe added a comment - When you use the commandline switch /Z7 the debug info is stored in the object and no server process is needed. This should also solve the problem.
            Hide
            s7726 Gavin Swanson added a comment -

            How does the /Z7 flag affect performance? My impression is that the point of mspdbsrv.exe is to keep the data around for other builds to use, thus decreasing build times for subsequent builds.

            Show
            s7726 Gavin Swanson added a comment - How does the /Z7 flag affect performance? My impression is that the point of mspdbsrv.exe is to keep the data around for other builds to use, thus decreasing build times for subsequent builds.
            Hide
            mifoe MiFoe added a comment -

            It does not affect performance but size of object file. with this option the debug information is stored in each object file instead of one pdb. At linktime, the debug information is written in a PDB file.

            Show
            mifoe MiFoe added a comment - It does not affect performance but size of object file. with this option the debug information is stored in each object file instead of one pdb. At linktime, the debug information is written in a PDB file.
            Hide
            solstice333 Kevin Navero added a comment -

            Just wanted to note that this also occurs on my slave nodes and each slave node only has one executor. So at first glance, since I'm not running concurrent builds on any individual slave node, it seems like this error occurring on my slave nodes doesn't make any sense.

            Show
            solstice333 Kevin Navero added a comment - Just wanted to note that this also occurs on my slave nodes and each slave node only has one executor. So at first glance, since I'm not running concurrent builds on any individual slave node, it seems like this error occurring on my slave nodes doesn't make any sense.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Daniel Weber
            Path:
            pom.xml
            src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
            src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
            http://jenkins-ci.org/commit/msbuild-plugin/855a84479b64f32ceb30f73433858dfe2efb5e9f
            Log:
            [FIXED JENKINS-9104] Veto killing mspdbsrv.exe

            Making use of the newly introduced ProcessKillingVeto extension point,
            we now make sure that mspdbsrv.exe survives process killing during build
            cleanup.

            This requires a Jenkins version >= 1.625, the new extension point was
            added there. I marked the extension as optional, so that the msbuild
            plugin should still work with older Jenkins releases.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Weber Path: pom.xml src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java http://jenkins-ci.org/commit/msbuild-plugin/855a84479b64f32ceb30f73433858dfe2efb5e9f Log: [FIXED JENKINS-9104] Veto killing mspdbsrv.exe Making use of the newly introduced ProcessKillingVeto extension point, we now make sure that mspdbsrv.exe survives process killing during build cleanup. This requires a Jenkins version >= 1.625, the new extension point was added there. I marked the extension as optional, so that the msbuild plugin should still work with older Jenkins releases.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Gregory Boissinot
            Path:
            pom.xml
            src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
            src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
            http://jenkins-ci.org/commit/msbuild-plugin/48084be76d434195c9e8b2ddc66f1fb5255a78de
            Log:
            Merge pull request #19 from DanielWeber/master

            [FIXED JENKINS-9104] Veto killing mspdbsrv.exe

            Compare: https://github.com/jenkinsci/msbuild-plugin/compare/98f71956d897...48084be76d43

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Gregory Boissinot Path: pom.xml src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java http://jenkins-ci.org/commit/msbuild-plugin/48084be76d434195c9e8b2ddc66f1fb5255a78de Log: Merge pull request #19 from DanielWeber/master [FIXED JENKINS-9104] Veto killing mspdbsrv.exe Compare: https://github.com/jenkinsci/msbuild-plugin/compare/98f71956d897...48084be76d43
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Gregory Boissinot
            Path:
            pom.xml
            src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
            src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
            http://jenkins-ci.org/commit/msbuild-plugin/b9a5b02117e0ee097aaf030ab2574daa3dcd217d
            Log:
            Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Gregory Boissinot Path: pom.xml src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java http://jenkins-ci.org/commit/msbuild-plugin/b9a5b02117e0ee097aaf030ab2574daa3dcd217d Log: Revert " [FIXED JENKINS-9104] Veto killing mspdbsrv.exe"
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Gregory Boissinot
            Path:
            pom.xml
            src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
            src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
            http://jenkins-ci.org/commit/msbuild-plugin/031a05982b16e42cba5544c4ba9511515941c62f
            Log:
            Merge pull request #20 from jenkinsci/revert-19-master

            Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

            Compare: https://github.com/jenkinsci/msbuild-plugin/compare/48084be76d43...031a05982b16

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Gregory Boissinot Path: pom.xml src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java http://jenkins-ci.org/commit/msbuild-plugin/031a05982b16e42cba5544c4ba9511515941c62f Log: Merge pull request #20 from jenkinsci/revert-19-master Revert " [FIXED JENKINS-9104] Veto killing mspdbsrv.exe" Compare: https://github.com/jenkinsci/msbuild-plugin/compare/48084be76d43...031a05982b16
            Hide
            damiandixon damian dixon added a comment - - edited

            > Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"

            I'm confused why has the code fix been reverted?

            The reason I am looking at this again is that the BUILD_ID work around is no longer working for me.

            Neither is the 1.25 msbuild plugin which is meant to have the fix in.

            I upgraded from 1.595 to 1.645.

            Show
            damiandixon damian dixon added a comment - - edited > Revert " [FIXED JENKINS-9104] Veto killing mspdbsrv.exe" I'm confused why has the code fix been reverted? The reason I am looking at this again is that the BUILD_ID work around is no longer working for me. Neither is the 1.25 msbuild plugin which is meant to have the fix in. I upgraded from 1.595 to 1.645.
            Show
            danielbeck Daniel Beck added a comment - damian dixon https://github.com/jenkinsci/msbuild-plugin/pull/20
            Hide
            danielweber Daniel Weber added a comment -

            damian dixon: My changes have been reverted by accident, the msbuild plugin release 1.25 does not contain the change required to fix this issue.
            There is a new PR reverting the revert: https://github.com/jenkinsci/msbuild-plugin/pull/21

            Show
            danielweber Daniel Weber added a comment - damian dixon : My changes have been reverted by accident, the msbuild plugin release 1.25 does not contain the change required to fix this issue. There is a new PR reverting the revert: https://github.com/jenkinsci/msbuild-plugin/pull/21
            Hide
            danielweber Daniel Weber added a comment -

            This is still not resolved. We need an update of the msbuild-plugin, see PR https://github.com/jenkinsci/msbuild-plugin/pull/21

            Show
            danielweber Daniel Weber added a comment - This is still not resolved. We need an update of the msbuild-plugin, see PR https://github.com/jenkinsci/msbuild-plugin/pull/21
            Hide
            danielbeck Daniel Beck added a comment -

            Daniel Weber This issue is filed against the core component, and that change has been included a long time ago.

            Show
            danielbeck Daniel Beck added a comment - Daniel Weber This issue is filed against the core component, and that change has been included a long time ago.
            Hide
            akb Antony Bartlett added a comment -

            Is there a plan for Visual Studio builds not started by the msbuild-plugin, please?

            I'm asking because our job configurations use a "Execute Windows batch command" build step rather than "Build a Visual Studio project or solution using MSBuild" build step (and our batch process is non-trivial).

            Show
            akb Antony Bartlett added a comment - Is there a plan for Visual Studio builds not started by the msbuild-plugin, please? I'm asking because our job configurations use a "Execute Windows batch command" build step rather than "Build a Visual Studio project or solution using MSBuild" build step (and our batch process is non-trivial).
            Hide
            danielbeck Daniel Beck added a comment -

            Antony Bartlett The proposed MSBuild Plugin change only requires the plugin to be installed to be effective (assuming mspdbsrv.exe is what you don't want killed).

            Show
            danielbeck Daniel Beck added a comment - Antony Bartlett The proposed MSBuild Plugin change only requires the plugin to be installed to be effective (assuming mspdbsrv.exe is what you don't want killed).
            Hide
            akb Antony Bartlett added a comment -

            That's great - thank you very much for clarifying this, and for your efforts to fix the wider issue - I'm looking forward to having more projects and configurations built automatically in a timely fashion through judicious use of parallelization

            Show
            akb Antony Bartlett added a comment - That's great - thank you very much for clarifying this, and for your efforts to fix the wider issue - I'm looking forward to having more projects and configurations built automatically in a timely fashion through judicious use of parallelization
            Hide
            danielbeck Daniel Beck added a comment -

            Antony Bartlett Forwarding the praise to my (first)namesake Daniel Weber who did all the work

            Show
            danielbeck Daniel Beck added a comment - Antony Bartlett Forwarding the praise to my (first)namesake Daniel Weber who did all the work
            Hide
            danielweber Daniel Weber added a comment -

            Daniel Beck: Well, the core stuff is done. But from a user's perspective the issue still exists.

            How can I get someone to merge the pending PR and create a release of the msbuild plugin?

            Show
            danielweber Daniel Weber added a comment - Daniel Beck : Well, the core stuff is done. But from a user's perspective the issue still exists. How can I get someone to merge the pending PR and create a release of the msbuild plugin?
            Hide
            peteboyrocket Pete W added a comment -

            What's happened to this fix? It sounds like its ready to go. How can we get a new release of the plugin?

            Show
            peteboyrocket Pete W added a comment - What's happened to this fix? It sounds like its ready to go. How can we get a new release of the plugin?
            Hide
            ykamezac Yannick Kamezac added a comment -

            I tried parallel builds with MSBuild plugin 1.25 on top of Jenkins 1.580.1 but unfortunately I still get this error (fatal error C1090: PDB API call failed, error code '23'). Did I miss something ?

            Show
            ykamezac Yannick Kamezac added a comment - I tried parallel builds with MSBuild plugin 1.25 on top of Jenkins 1.580.1 but unfortunately I still get this error (fatal error C1090: PDB API call failed, error code '23'). Did I miss something ?
            Hide
            ostojan Aleksander Stojanowski added a comment -

            When do you publish new version of plugin with fix? It's been month since you released version with(out) fix...

            Show
            ostojan Aleksander Stojanowski added a comment - When do you publish new version of plugin with fix? It's been month since you released version with(out) fix...
            Hide
            jxramos Jaime Ramos added a comment - - edited

            I'm in need of a fix for this too, it's consistently failing numerous jobs for me. Is there an old version of Jenkins to revert to that avoids this particular problem? I'm willing to go that route as a workaround.
            So far this has been a cause of a pretty bad first impressions for a team I setup a CI build setup for who had never seen Jenkins before.
            I'm using VS2010 devenv.exe to build the solution files.

            Show
            jxramos Jaime Ramos added a comment - - edited I'm in need of a fix for this too, it's consistently failing numerous jobs for me. Is there an old version of Jenkins to revert to that avoids this particular problem? I'm willing to go that route as a workaround. So far this has been a cause of a pretty bad first impressions for a team I setup a CI build setup for who had never seen Jenkins before. I'm using VS2010 devenv.exe to build the solution files.
            Hide
            olexandr_maltsev Olexandr Maltsev added a comment - - edited

            Hello Jaime,
            I found a solution.
            I think it is a workaround, but it works for me.
            I set for every project the addition String parameter.
            Go to the Jenkins Project and set "This build is parameterized", “Name” – “BUILD_ID”, “Default Value” – “DoNotKillMe”.

            Show
            olexandr_maltsev Olexandr Maltsev added a comment - - edited Hello Jaime, I found a solution. I think it is a workaround, but it works for me. I set for every project the addition String parameter. Go to the Jenkins Project and set "This build is parameterized", “Name” – “BUILD_ID”, “Default Value” – “DoNotKillMe”.
            Hide
            olexandr_maltsev Olexandr Maltsev added a comment -

            Show
            olexandr_maltsev Olexandr Maltsev added a comment -
            Hide
            gl1koz3 Edgars Batna added a comment - - edited

            Stumbled upon this issue immediately after trying parallel builds. Been open for 5 years now, so I guess you can simply check for 'mspdbsrv.exe' and leave it alone? Please free us of our pain.

            Show
            gl1koz3 Edgars Batna added a comment - - edited Stumbled upon this issue immediately after trying parallel builds. Been open for 5 years now, so I guess you can simply check for 'mspdbsrv.exe' and leave it alone? Please free us of our pain.
            Hide
            zzayats Ilya I. added a comment -

            Somebody, publish the new version please. Apparently, the fix is already in the source code on GitHub. Can someone else (other than the maintainer) release the new version?

            Show
            zzayats Ilya I. added a comment - Somebody, publish the new version please. Apparently, the fix is already in the source code on GitHub. Can someone else (other than the maintainer) release the new version?
            Hide
            teljj001 James Telfer added a comment -

            FWIW, we implemented a workaround to this issue that doesn't involve wiping out the BUILD_ID variable (as we need to use it). Having a release with the Veto would be better, but this avoids random crashes in the meantime.

            Instead of allowing the MSBuild process to start the daemon itself, you cause the daemon to start using an environment that you choose. MSBuild then just uses the instance you started rather than starting its own.

            The Powershell we use is as follows. Use the Powershell plugin to run this as a step before the MSBuild plugin step (could be translated to Windows batch too if you like).

            # https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller
            
            var originalBuildID = $Env:BUILD_ID
            $Env:BUILD_ID = "DoNotKillMe"
            try
            {
                start mspdbsrv -argumentlist '-start','-spawn' -NoNewWindow
            }
            catch {}
            $Env:BUILD_ID = originalBuildID
            
            Show
            teljj001 James Telfer added a comment - FWIW, we implemented a workaround to this issue that doesn't involve wiping out the BUILD_ID variable (as we need to use it). Having a release with the Veto would be better, but this avoids random crashes in the meantime. Instead of allowing the MSBuild process to start the daemon itself, you cause the daemon to start using an environment that you choose. MSBuild then just uses the instance you started rather than starting its own. The Powershell we use is as follows. Use the Powershell plugin to run this as a step before the MSBuild plugin step (could be translated to Windows batch too if you like). # https: //wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller var originalBuildID = $Env:BUILD_ID $Env:BUILD_ID = "DoNotKillMe" try { start mspdbsrv -argumentlist '-start' , '-spawn' -NoNewWindow } catch {} $Env:BUILD_ID = originalBuildID
            Hide
            danielbeck Daniel Beck added a comment -

            msbuild-1.26 should contain the fix. Can we finally resolve this, or is something missing?

            Show
            danielbeck Daniel Beck added a comment - msbuild-1.26 should contain the fix. Can we finally resolve this, or is something missing?
            Hide
            teljj001 James Telfer added a comment -

            IMO, as soon as 1.26 is released.

            Show
            teljj001 James Telfer added a comment - IMO, as soon as 1.26 is released.
            Hide
            danielbeck Daniel Beck added a comment -

            *sigh*

            1.26 is tagged in GitHub but no artifacts are uploaded. Looks like a failed release. Sorry about that.

            Note that MSBuild Plugin is almost certainly not currently maintained, as Gregory stopped working on his plugins, so if someone here wants to take over (Daniel Weber perhaps?) that should be possible.

            Show
            danielbeck Daniel Beck added a comment - *sigh* 1.26 is tagged in GitHub but no artifacts are uploaded. Looks like a failed release. Sorry about that. Note that MSBuild Plugin is almost certainly not currently maintained, as Gregory stopped working on his plugins, so if someone here wants to take over ( Daniel Weber perhaps?) that should be possible.
            Hide
            teljj001 James Telfer added a comment -

            Daniel Beck no need to apologise, I appreciate you looking at it.

            Show
            teljj001 James Telfer added a comment - Daniel Beck no need to apologise, I appreciate you looking at it.
            Hide
            josch Johannes Schmieder added a comment -

            As a workaround I have created a Jenkins Job that executes a Windows batch command on the jenkins node where Visual Studio is installed.
            The jenkins job triggers the batch command once a day and works in my environment for several years now.
            The batch command looks like this:

            set MSPDBSRV_EXE=mspdbsrv.exe
            set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE
            
            set PATH=%MSPDBSRV_PATH%;%PATH%
            set ORIG_BUILD_ID=%BUILD_ID%
            set BUILD_ID=DoNotKillMe
            
            echo stop mspdbsrv.exe
            %MSPDBSRV_EXE% -stop
            
            echo wait 7 sec
            %windir%\system32\ping.exe -n 7 localhost> nul
            
            echo restart mspdbsrv.exe with a shutdowntime of 25 hours
            start /b %MSPDBSRV_EXE% -start -spawn -shutdowntime 90000
            
            set BUILD_ID=%ORIG_BUILD_ID%
            set ORIG_BUILD_ID=
            exit 0
            

            What the batch command does is:
            stop the mspdbsrv.exe to free up resources
            start mspdbsrv.exe with BUILD_ID=DoNotKillMe and a shutdowntime of 25 hours, that leaks the mspdbsrv process without getting killed and it runs for 25 hours so that other build jobs can use the already running process

            What you maybe have to do is to change the Path to mspdbsrv -> set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

            Show
            josch Johannes Schmieder added a comment - As a workaround I have created a Jenkins Job that executes a Windows batch command on the jenkins node where Visual Studio is installed. The jenkins job triggers the batch command once a day and works in my environment for several years now. The batch command looks like this: set MSPDBSRV_EXE=mspdbsrv.exe set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE set PATH=%MSPDBSRV_PATH%;%PATH% set ORIG_BUILD_ID=%BUILD_ID% set BUILD_ID=DoNotKillMe echo stop mspdbsrv.exe %MSPDBSRV_EXE% -stop echo wait 7 sec %windir%\system32\ping.exe -n 7 localhost> nul echo restart mspdbsrv.exe with a shutdowntime of 25 hours start /b %MSPDBSRV_EXE% -start -spawn -shutdowntime 90000 set BUILD_ID=%ORIG_BUILD_ID% set ORIG_BUILD_ID= exit 0 What the batch command does is: stop the mspdbsrv.exe to free up resources start mspdbsrv.exe with BUILD_ID=DoNotKillMe and a shutdowntime of 25 hours, that leaks the mspdbsrv process without getting killed and it runs for 25 hours so that other build jobs can use the already running process What you maybe have to do is to change the Path to mspdbsrv -> set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE
            Hide
            mbrock Michael Brock added a comment -

            Updating the msbuild plugin won't work in our situation. We run into this issue, but we don't have the plugin installed. Rather the issue comes for us in the Final Builder scripts we run via Jenkins that call msbuild.

            Show
            mbrock Michael Brock added a comment - Updating the msbuild plugin won't work in our situation. We run into this issue, but we don't have the plugin installed. Rather the issue comes for us in the Final Builder scripts we run via Jenkins that call msbuild.
            Hide
            danielbeck Daniel Beck added a comment -

            Then install it. MSBuild will veto all mspdbsrv killing.

            Show
            danielbeck Daniel Beck added a comment - Then install it. MSBuild will veto all mspdbsrv killing.
            Hide
            mwinter69 Markus Winter added a comment - - edited

            set the environment variable
            _MSPDBSRV_ENDPOINT_=$JENKINS_COOKIE
            (The variable starts and ends with a single '_')
            This will lead to separate instance of mspdbsrv being started.

            Show
            mwinter69 Markus Winter added a comment - - edited set the environment variable _ MSPDBSRV_ENDPOINT _=$JENKINS_COOKIE (The variable starts and ends with a single '_') This will lead to separate instance of mspdbsrv being started.
            Hide
            grillba Mark Grills added a comment - - edited

            Markus Winter, thanks for the pointer.

            We couldn't get it working with $JENKINS_COOKIE but managed to correct it by adding the following property via EnvInject prior to kicking off the build

            _MSPDBSRV_ENDPOINT_=$BUILD_TAG

            This resulted in a separate process being initiated for each build and no conflicts/error.

            Edit: Correction due to formatting. Refer below

            Show
            grillba Mark Grills added a comment - - edited Markus Winter , thanks for the pointer. We couldn't get it working with $JENKINS_COOKIE but managed to correct it by adding the following property via EnvInject prior to kicking off the build _MSPDBSRV_ENDPOINT_=$BUILD_TAG This resulted in a separate process being initiated for each build and no conflicts/error. Edit: Correction due to formatting. Refer below
            Hide
            hidminds Daniel Fischer added a comment - - edited

            It is

            _MSPDBSRV_ENDPOINT_

            (with underlines) not MSPDBSRV_ENDPOINT.

            Just realized it myself that it's a formatting issue. If you enclose the word in underlines it will get italicised and the underlines disappear.

            Show
            hidminds Daniel Fischer added a comment - - edited It is _MSPDBSRV_ENDPOINT_ (with underlines) not MSPDBSRV_ENDPOINT. Just realized it myself that it's a formatting issue. If you enclose the word in underlines it will get italicised and the underlines disappear.
            Hide
            grillba Mark Grills added a comment -

            Apologies, yes an underscore at each end.

            Show
            grillba Mark Grills added a comment - Apologies, yes an underscore at each end.
            Hide
            andne Andy Neebel added a comment -

            We recently re-encountered this on our build network and I did some investigation, here's what I found:

            • On the master node, the veto from MSBuild plugin works properly, I was able to confirm the log message show it.
            • On a slave node, I do not see the log message from the veto. Instead I see a message that my process is being killed recursively (I was watching the process list to get the id during the build).

            It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this.

            Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?

            Show
            andne Andy Neebel added a comment - We recently re-encountered this on our build network and I did some investigation, here's what I found: On the master node, the veto from MSBuild plugin works properly, I was able to confirm the log message show it. On a slave node, I do not see the log message from the veto. Instead I see a message that my process is being killed recursively (I was watching the process list to get the id during the build). It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this. Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?
            Hide
            andne Andy Neebel added a comment -

            I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners

            • getVeto() is how the whitelist extension is accessed to block the killing of the process. This function just gets the list as it exists, no attempt to go ask the master for any information.
            • getKillers() is used to access the list of ProcessKillers if there are any classes implementing that extension point. This function gets the channel back to the master so it can ask for the master's list of classes implementing this extension.

            I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.

            Show
            andne Andy Neebel added a comment - I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners getVeto() is how the whitelist extension is accessed to block the killing of the process. This function just gets the list as it exists, no attempt to go ask the master for any information. getKillers() is used to access the list of ProcessKillers if there are any classes implementing that extension point. This function gets the channel back to the master so it can ask for the master's list of classes implementing this extension. I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.
            Hide
            walteste Stefan Walter added a comment -

            Is there any workaround to this issue, because it completely breaks our usage of Jenkins?

            Show
            walteste Stefan Walter added a comment - Is there any workaround to this issue, because it completely breaks our usage of Jenkins?
            Hide
            grillba Mark Grills added a comment -

            Hi Stefan, refer my comments above. This fixed it for us. Cheers

            Show
            grillba Mark Grills added a comment - Hi Stefan, refer my comments above. This fixed it for us. Cheers
            Hide
            walteste Stefan Walter added a comment -

            Hi Mark Grills, thanks a lot for your suggestion. It seems that this solved our issues.

            Show
            walteste Stefan Walter added a comment - Hi Mark Grills , thanks a lot for your suggestion. It seems that this solved our issues.
            Hide
            ext3h Andreas Ringlstetter added a comment -

            Little side note: It might not be sufficient to just specify _MSPDBSRV_ENDPOINT_ env variable in order to avoid conflicts. I recommend to additionally also set TMP , TEMP and TEMPDIR to an isolated folder if you plan on invoking MSBUILD in parallel as various plugins for MSBUILD as well as MSBUILD itself will place files there.

            Further catch of using _MSPDBSRV_ENDPOINT_ is, that now serialization of parallel builds in the same working directory will break in return, unless you made sure that the tempoary files for the different architectures (e.g. the temporary program database created with the individual object files, and commonly named just e.g. "Debug\vc120.pdb", notice the lack of a prefix for the architecture) are completely isolated as well. Otherwise the different mspdbsrv-instances will now collide accessing the same file.

            Show
            ext3h Andreas Ringlstetter added a comment - Little side note: It might not be sufficient to just specify _ MSPDBSRV_ENDPOINT _ env variable in order to avoid conflicts. I recommend to additionally also set TMP , TEMP and TEMPDIR to an isolated folder if you plan on invoking MSBUILD in parallel as various plugins for MSBUILD as well as MSBUILD itself will place files there. Further catch of using _ MSPDBSRV_ENDPOINT _ is, that now serialization of parallel builds in the same working directory will break in return, unless you made sure that the tempoary files for the different architectures (e.g. the temporary program database created with the individual object files, and commonly named just e.g. "Debug\vc120.pdb", notice the lack of a prefix for the architecture) are completely isolated as well. Otherwise the different mspdbsrv-instances will now collide accessing the same file.
            Hide
            billhoo Bill Hoo added a comment - - edited

            Mark Grills, Stefan Walter Hi there, we've got this issue too, and we followed your suggestions to config the master Jenkins node like this:

            Configure system > Environment variables > Add new key value pair below:

             

            KEY: _MSPDBSRV_ENDPOINT_

            VALUE: $BUILD_TAG

             

            But we got nothing, the error still raised up on windows slave, could you please explain the solution in detail? Should we set this Key-Value on the slave node? Thanks in advance

            Show
            billhoo Bill Hoo added a comment - - edited Mark Grills , Stefan Walter Hi there, we've got this issue too, and we followed your suggestions to config the master Jenkins node like this: Configure system > Environment variables > Add new key value pair below:   KEY: _ MSPDBSRV_ENDPOINT _ VALUE: $BUILD_TAG   But we got nothing, the error still raised up on windows slave, could you please explain the solution in detail? Should we set this Key-Value on the slave node? Thanks in advance
            Hide
            grillba Mark Grills added a comment -

            @billhoo,

            You need to do it at the Job level - Not the system level. Use envinject to add the environment variable

            Have a look here for how to use envinject,  https://wiki.jenkins.io/display/JENKINS/EnvInject+Plugin

            Make sure you follow the "Inject variables as a build step" topic

            Regards

            Mark

             

             

             

             

            Show
            grillba Mark Grills added a comment - @billhoo, You need to do it at the Job level - Not the system level. Use envinject to add the environment variable Have a look here for how to use envinject,   https://wiki.jenkins.io/display/JENKINS/EnvInject+Plugin Make sure you follow the "Inject variables as a build step" topic Regards Mark        
            Hide
            billhoo Bill Hoo added a comment -

            Mark Grills,

            Thanks for the timely reply, we've followed your guide and found that there were already 3 seprated mspdbsvr.exe processes(for test purpose, we've ran 3 jobs on one windows slave concurrently) ran in background, so it seems worked, but unfortunately, one of our job still failed due to C1090 error.

             

            This is the screenshot of EnvInject in each of our 3 Pipeline jobs configuration page,

            I don't think there's anything wrong here, do I miss something?

             

            Thanks,

            Bill.

            Show
            billhoo Bill Hoo added a comment - Mark Grills , Thanks for the timely reply, we've followed your guide and found that there were already 3 seprated mspdbsvr.exe processes(for test purpose, we've ran 3 jobs on one windows slave concurrently) ran in background, so it seems worked, but unfortunately, one of our job still failed due to C1090 error.   This is the screenshot of EnvInject in each of our 3 Pipeline jobs configuration page, I don't think there's anything wrong here, do I miss something?   Thanks, Bill.
            Hide
            adam1book Adam Cornwell added a comment - - edited

            Just in case this helps anyone, I was able to fix all problems mentioned so far in this issue and comments by following the recommendations on this blog post:
            http://blog.peter-b.co.uk/2017/02/stop-mspdbsrv-from-breaking-ci-build.html

            The solution involves
            1. Installing the MSBuild plugin ver. 1.26 or higher in Jenkins. Setup for use on the server is optional, only needs to be installed. This stops Jenkins from killing the mspdbsrv process automatically.

            2. Using the _MSPDBSRV_ENDPOINT_ environment variable as done in the comment above.

            3. Spawning and killing a new specific mspdbsrv instance of the right Visual Studio version at the beginning and end of each job which uses it.

            Powershell implementation of the Python solution in the blog (change VS140COMNTOOLS to the version of Visual Studio being used):

            # Manually start mspdbsrv so a parallel job's instance isn't used, works because _MSPDBSRV_ENDPOINT_ is set to a unique value
            # (otherwise results in "Fatal error C1090: PDB API call failed, error code '23'" when one of the builds completes).
            $mspdbsrv_proc = Start-Process -FilePath "${env:VS140COMNTOOLS}\..\IDE\mspdbsrv.exe" -ArgumentList ('-start','-shutdowntime','-1') -passthru
            
            .\{PowershellBuildScriptName}.ps1
            
            # Manually kill mspdbsrv once the build completes using the previously saved process id
            Stop-Process $mspdbsrv_proc.Id

             

            Show
            adam1book Adam Cornwell added a comment - - edited Just in case this helps anyone, I was able to fix all problems mentioned so far in this issue and comments by following the recommendations on this blog post: http://blog.peter-b.co.uk/2017/02/stop-mspdbsrv-from-breaking-ci-build.html The solution involves 1. Installing the MSBuild plugin ver. 1.26 or higher in Jenkins. Setup for use on the server is optional, only needs to be installed. This stops Jenkins from killing the mspdbsrv process automatically. 2. Using the _ MSPDBSRV_ENDPOINT _ environment variable as done in the comment above. 3. Spawning and killing a new specific mspdbsrv instance of the right Visual Studio version at the beginning and end of each job which uses it. Powershell implementation of the Python solution in the blog (change VS140COMNTOOLS to the version of Visual Studio being used): # Manually start mspdbsrv so a parallel job 's instance isn' t used, works because _MSPDBSRV_ENDPOINT_ is set to a unique value # (otherwise results in "Fatal error C1090: PDB API call failed, error code '23' " when one of the builds completes). $mspdbsrv_proc = Start- Process -FilePath "${env:VS140COMNTOOLS}\..\IDE\mspdbsrv.exe" -ArgumentList ( '-start' , '-shutdowntime' , '-1' ) -passthru .\{PowershellBuildScriptName}.ps1 # Manually kill mspdbsrv once the build completes using the previously saved process id Stop- Process $mspdbsrv_proc.Id  
            Hide
            jakuborava Jakub Orava added a comment -

            I had the same problem with parallel builds (eg. running in parallel job A from trunk and job A from branch), I tried the solution with _MSPDBSRV_ENDPOINT_ with value BUILD_TAG and it worked almost for all jobs. In one situation I still had that error. So I replaced BUILD_TAG with JOB_NAME environment variable and suddenly it was fine, for now we are out of problems. If anyone has still the problem with ENDPOINT solution, try to change BUILD_TAG for something else. If you do not allow parallel build in single job, JOB_NAME should be enough, otherwise you can try JOB_NAME + BUILD_NUMBER combination.

            Maybe ENDPOINT has some restrictions, but I did not have a time to inspect this deeper. What I know is that the problematic job has the longest name in my Jenkins - approx. 48 characters.

            Show
            jakuborava Jakub Orava added a comment - I had the same problem with parallel builds (eg. running in parallel job A from trunk and job A from branch), I tried the solution with _ MSPDBSRV_ENDPOINT _ with value BUILD_TAG and it worked almost for all jobs. In one situation I still had that error. So I replaced BUILD_TAG with JOB_NAME environment variable and suddenly it was fine, for now we are out of problems. If anyone has still the problem with ENDPOINT solution, try to change BUILD_TAG for something else. If you do not allow parallel build in single job, JOB_NAME should be enough, otherwise you can try JOB_NAME + BUILD_NUMBER combination. Maybe ENDPOINT has some restrictions, but I did not have a time to inspect this deeper. What I know is that the problematic job has the longest name in my Jenkins - approx. 48 characters.
            Hide
            davida2009 David Aldrich added a comment -

            Please can anyone advise me how to set _MSPDBSRV_ENDPOINT_ with value BUILD_TAG in a pipeline declarative script?

            I don’t really understand the difference between defining and injecting an environment variable. I could do:

            stage('build_VisualStudio') {
                    environment { _MSPDBSRV_ENDPOINT_=$BUILD_TAG }
            etc.
            

            Would that be sufficient or must environment variable injection be done in a different way?

            Show
            davida2009 David Aldrich added a comment - Please can anyone advise me how to set _MSPDBSRV_ENDPOINT_ with value BUILD_TAG in a pipeline declarative script? I don’t really understand the difference between defining and injecting an environment variable. I could do: stage( 'build_VisualStudio' ) { environment { _MSPDBSRV_ENDPOINT_=$BUILD_TAG } etc. Would that be sufficient or must environment variable injection be done in a different way?
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Daniel Beck
            Path:
            content/_data/changelogs/weekly.yml
            http://jenkins-ci.org/commit/jenkins.io/0391fcb9b4c957e9e41fde03409de330a3de571d
            Log:
            Remove JENKINS-9104 fix from release to unblock it

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: content/_data/changelogs/weekly.yml http://jenkins-ci.org/commit/jenkins.io/0391fcb9b4c957e9e41fde03409de330a3de571d Log: Remove JENKINS-9104 fix from release to unblock it
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Daniel Beck
            Path:
            content/_data/changelogs/weekly.yml
            http://jenkins-ci.org/commit/jenkins.io/62409d42a5769cac66337cbd4b5df5754f0e2384
            Log:
            Merge pull request #1522 from daniel-beck/changelog-2.119-amended

            Remove JENKINS-9104 fix from release to unblock it

            Compare: https://github.com/jenkins-infra/jenkins.io/compare/58f029c79331...62409d42a576

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: content/_data/changelogs/weekly.yml http://jenkins-ci.org/commit/jenkins.io/62409d42a5769cac66337cbd4b5df5754f0e2384 Log: Merge pull request #1522 from daniel-beck/changelog-2.119-amended Remove JENKINS-9104 fix from release to unblock it Compare: https://github.com/jenkins-infra/jenkins.io/compare/58f029c79331...62409d42a576
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            core/src/main/java/hudson/util/ProcessTree.java
            test/src/test/java/hudson/util/ProcessTreeKillerTest.java
            http://jenkins-ci.org/commit/jenkins/3465da4764c322baf4fb5b90651ef6b9bcd409fb
            Log:
            Merge pull request #3419 from dwnusbaum/JENKINS-9104-test-fix

            Fix test failure by cleaning up static state after tests

            Compare: https://github.com/jenkinsci/jenkins/compare/ddbc4bbce7d3...3465da4764c3
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/util/ProcessTree.java test/src/test/java/hudson/util/ProcessTreeKillerTest.java http://jenkins-ci.org/commit/jenkins/3465da4764c322baf4fb5b90651ef6b9bcd409fb Log: Merge pull request #3419 from dwnusbaum/ JENKINS-9104 -test-fix Fix test failure by cleaning up static state after tests Compare: https://github.com/jenkinsci/jenkins/compare/ddbc4bbce7d3...3465da4764c3 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            danielbeck Daniel Beck added a comment -

            Jenkins 2.120 contains a fix for the previous problem of the ProcessKillingVeto extension point not working on agents.

            Show
            danielbeck Daniel Beck added a comment - Jenkins 2.120 contains a fix for the previous problem of the ProcessKillingVeto extension point not working on agents.
            Hide
            vuiletgiraffe John Doe added a comment -

            I'm occasionally getting this error with the latest versions of Jenkins and all the plugins. It started in the recent months, haven't been a problem for a year before that. The problem seems to have NOT been resolved, or possibly re-emerged.

            What can I do, is there a workaround? Sporadic build failures for no reason are super annoying.

            Show
            vuiletgiraffe John Doe added a comment - I'm occasionally getting this error with the latest versions of Jenkins and all the plugins. It started in the recent months, haven't been a problem for a year before that. The problem seems to have NOT been resolved, or possibly re-emerged. What can I do, is there a workaround? Sporadic build failures for no reason are super annoying.
            Hide
            billhoo Bill Hoo added a comment -

            Same error with latest Jenkins ver. 2.150.3

            The error is aways occured when running two jobs concurrently on the same agent with VS2015:
            fatal error C1090: PDB API

            Show
            billhoo Bill Hoo added a comment - Same error with latest Jenkins ver. 2.150.3 The error is aways occured when running two jobs concurrently on the same agent with VS2015: fatal error C1090: PDB API
            Hide
            vuiletgiraffe John Doe added a comment -

            Bill Hoo, thanks for the tip! I was running VS 2017 (v141 toolset), but there were indeed two simultaneous jobs! So the workaround is to limit this agent to one job at a time. Pity, as it's a pretty powerful multicore server, but it's better than flaky builds.

            Show
            vuiletgiraffe John Doe added a comment - Bill Hoo , thanks for the tip! I was running VS 2017 (v141 toolset), but there were indeed two simultaneous jobs! So the workaround is to limit this agent to one job at a time. Pity, as it's a pretty powerful multicore server, but it's better than flaky builds.
            Hide
            billhoo Bill Hoo added a comment -

            John Doe, totaly the same, we have many different jobs which use MSVC14 as toolchain, but now we can only perform one build at a time, its a huge waste of mashine resources ;(

            Hope it can be truly solved.

            Show
            billhoo Bill Hoo added a comment - John Doe , totaly the same, we have many different jobs which use MSVC14 as toolchain, but now we can only perform one build at a time, its a huge waste of mashine resources ;( Hope it can be truly solved.
            Hide
            ext3h Andreas Ringlstetter added a comment - - edited

            Solution is still the same, before invoking `msbuild`, set the following environment variables to something unique:

            _MSPDBSRV_ENDPOINT_=<UUID>
            TMP=<Unique Tempdir>
            TEMP=$TMP
            TMPDIR=$TMP

            Once you have done that, you can launch as many parallel MSBuild instances as you like, even mixing different msbuild versions or whatever. They will not interfere in any way. Doing that on a regular base with mixed MSVC12, MSVC14 and MSVC15 toolchains on the same machine, and didn't have any issues since.

            The "official" fix for this problem (trying not to kill the job scheduler) is plain wrong, and causes massive issues. Mostly because MSBuild itself isn't exactly stable either when using the same job server for multiple parallel builds. And if the builds are using different toolchains, a crash is ensured.

            Show
            ext3h Andreas Ringlstetter added a comment - - edited Solution is still the same, before invoking `msbuild`, set the following environment variables to something unique: _MSPDBSRV_ENDPOINT_=<UUID> TMP=<Unique Tempdir> TEMP=$TMP TMPDIR=$TMP Once you have done that, you can launch as many parallel MSBuild instances as you like, even mixing different msbuild versions or whatever. They will not interfere in any way. Doing that on a regular base with mixed MSVC12, MSVC14 and MSVC15 toolchains on the same machine, and didn't have any issues since. The "official" fix for this problem (trying not to kill the job scheduler) is plain wrong, and causes massive issues. Mostly because MSBuild itself isn't exactly stable either when using the same job server for multiple parallel builds. And if the builds are using different toolchains, a crash is ensured.

              People

              • Assignee:
                danielweber Daniel Weber
                Reporter:
                gordin Christoph Vogtländer
              • Votes:
                71 Vote for this issue
                Watchers:
                92 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: