Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9104

Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
      None
    • Environment:
      Windows XP, Windows 7 using MSBuild or devenv.exe to build MS Visual Studio Projects

      Description

      I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
      This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].

      I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
      I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.

      Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.


      [1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
      [2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/

        Issue Links

          Activity

          Hide
          laro Lars Rosenboom added a comment - - edited

          Maybe there is a way to shut down the mspdbsrv.exe softly, so it stops only after all active request (by parallel builds) are done. Then it should simply restart on the next request.

          Another solution would be to allow the user to give a list of process names not to kill (or maybe hardcode not to kill mspdbsrv.exe).

          Show
          laro Lars Rosenboom added a comment - - edited Maybe there is a way to shut down the mspdbsrv.exe softly, so it stops only after all active request (by parallel builds) are done. Then it should simply restart on the next request. Another solution would be to allow the user to give a list of process names not to kill (or maybe hardcode not to kill mspdbsrv.exe).
          Hide
          s7726 Gavin Swanson added a comment -

          Stopping after a timeout period after all active requests and continuing to run when it gets a new request are the way mspdbsrv runs normally when something doesn't go around killing it (ala Jenkins).

          I believe the correct solution is a whitelist.

          Show
          s7726 Gavin Swanson added a comment - Stopping after a timeout period after all active requests and continuing to run when it gets a new request are the way mspdbsrv runs normally when something doesn't go around killing it (ala Jenkins). I believe the correct solution is a whitelist.
          Hide
          leedega Kevin Phillips added a comment -

          Update
          So, it turns out setting up some kind of background process to spawn a copy of the pdbsrv process isn't going to work as expected. From what I can tell Windows seems to be able to tell when a process has been launched from a system service and it will prevent those sub-processes from using other processes that are spawned elsewhere. The particulars of my test case are as follows:

          1. Setup a small Python script that launches a copy of mspdbsrv.exe when called
          2. Setup a scheduled task in Windows to run the python script on boot
          3. Reboot the agent - confirm the mspdbsrv.exe process is running
          4. trigger a compilation operation via the Jenkins dashboard
          5. A new, secondary copy of mspdbsrv.exe is spawned to serve the Jenkins agent. This sub-process is then terminated as per usual once the Jenkins build is complete.

          I have confirmed that both the service that runs the Jenkins agent and the scheduled task use the same user profile and credentials and that both environments are using the same version of mspdbsrv.exe with the same set of command line parameters (ie: -start -spawn).

          Looks like I have to head back to the drawing board.

          Show
          leedega Kevin Phillips added a comment - Update So, it turns out setting up some kind of background process to spawn a copy of the pdbsrv process isn't going to work as expected. From what I can tell Windows seems to be able to tell when a process has been launched from a system service and it will prevent those sub-processes from using other processes that are spawned elsewhere. The particulars of my test case are as follows: Setup a small Python script that launches a copy of mspdbsrv.exe when called Setup a scheduled task in Windows to run the python script on boot Reboot the agent - confirm the mspdbsrv.exe process is running trigger a compilation operation via the Jenkins dashboard A new, secondary copy of mspdbsrv.exe is spawned to serve the Jenkins agent. This sub-process is then terminated as per usual once the Jenkins build is complete. I have confirmed that both the service that runs the Jenkins agent and the scheduled task use the same user profile and credentials and that both environments are using the same version of mspdbsrv.exe with the same set of command line parameters (ie: -start -spawn). Looks like I have to head back to the drawing board.
          Hide
          leedega Kevin Phillips added a comment -

          Update
          As a quick sanity check I decided to throw together a quick ad-hoc test configuration where by I overload the BUILD_ID in the environment for one of my compilation jobs just to see if one of the hacks proposed earlier will potentially work. Unfortunately it looks like this is not a robust solution either. I have confirmed in the trivial case that the solution does work, as in:

          1. Setup a job with a single shell operation as a build step, configured as follows:
            • override the BUILD_ID env var with some arbitrary value
            • call into MSBuild to perform the compilation
          2. run a build of the given job
          3. upon completion, confirm that the mspdbsrv.exe process is still running - TEST SUCCESSFUL

          However, unfortunately I've found another case where this solution doesn't work. Apparently if you manually kill the build while it is running Jenkins still somehow manages to locate the orphaned pdbsrv process and kill it, despite the changes described above. So, to put it more clearly:

          1. Setup a job with a single shell operation as a build step, configured as follows:
            • override the BUILD_ID env var with some arbitrary value
            • call into MSBuild to perform the compilation
          2. run a build of the given job
          3. while the compilation operation is running, and you have confirmed the mspdbsrv.exe process has been launched, manually force the running build to terminate (ie: by clicking on the X icon next to the running build on the Jenkins dashboard)
          4. FAILURE - Jenkins still terminates the pdbsrv process

          I have confirmed that the pdbsrv process does correctly inherit the overloaded BUILD_ID, so Jenkins is somehow able to locate and terminate the process in this case. I suspect what may be happening in my test env is that at the point at which I manually kill the build Jenkins is still running one or more Visual Studio operations which have a direct link to the mspdbsrv.exe process and thus it detects and kills the thread by recursively transcending the process tree killing all running processes / threads that are tied to the agent at the time.

          Either way, this example shows that even this 'hack' of overriding the BUILD_ID is fragile at best. It looks like we may have no choice but to wait for a fix for that 'whitelist' solution before we can consider upgrading our Jenkins instance.

          Show
          leedega Kevin Phillips added a comment - Update As a quick sanity check I decided to throw together a quick ad-hoc test configuration where by I overload the BUILD_ID in the environment for one of my compilation jobs just to see if one of the hacks proposed earlier will potentially work. Unfortunately it looks like this is not a robust solution either. I have confirmed in the trivial case that the solution does work, as in: Setup a job with a single shell operation as a build step, configured as follows: override the BUILD_ID env var with some arbitrary value call into MSBuild to perform the compilation run a build of the given job upon completion, confirm that the mspdbsrv.exe process is still running - TEST SUCCESSFUL However, unfortunately I've found another case where this solution doesn't work. Apparently if you manually kill the build while it is running Jenkins still somehow manages to locate the orphaned pdbsrv process and kill it, despite the changes described above. So, to put it more clearly: Setup a job with a single shell operation as a build step, configured as follows: override the BUILD_ID env var with some arbitrary value call into MSBuild to perform the compilation run a build of the given job while the compilation operation is running, and you have confirmed the mspdbsrv.exe process has been launched, manually force the running build to terminate (ie: by clicking on the X icon next to the running build on the Jenkins dashboard) FAILURE - Jenkins still terminates the pdbsrv process I have confirmed that the pdbsrv process does correctly inherit the overloaded BUILD_ID, so Jenkins is somehow able to locate and terminate the process in this case. I suspect what may be happening in my test env is that at the point at which I manually kill the build Jenkins is still running one or more Visual Studio operations which have a direct link to the mspdbsrv.exe process and thus it detects and kills the thread by recursively transcending the process tree killing all running processes / threads that are tied to the agent at the time. Either way, this example shows that even this 'hack' of overriding the BUILD_ID is fragile at best. It looks like we may have no choice but to wait for a fix for that 'whitelist' solution before we can consider upgrading our Jenkins instance.
          Hide
          leedega Kevin Phillips added a comment -

          Update
          While reporting the issue in my last comment I had the idea for a slight variation of the configuration described there which does appear to work in both use cases. The main modification that I made was to separate the build operation into two separate build operations:

          • the first is a simple Windows command line call which overrides BUILD_ID and then launches mspdbsrv.exe. Once this first operation completes, Jenkins terminates the shell session that is linked to the pdbsrv process thus decoupling it from the agent. Combined with the overloaded BUILD_ID env var, Jenkins can no longer track the process.
          • the second operation is just another instance of a Windows shell session that then calls into msbuild to proceed with the build.

          Theoretically even this solution "could" fall prey to the same problem I described in my previous comment, however the execution time of this initial build step is negligible and is highly unlikely to be exploited in practice (ie: a user would need to hit the kill button on the build at just that small fraction of a second it takes Jenkins to launch mspdbsrv.exe).

          I'm not sure how easy this hack will be for us to roll out into production at the scale we need, but just in case others find this tidbit of information helpful I thought I'd provide it here.

          Show
          leedega Kevin Phillips added a comment - Update While reporting the issue in my last comment I had the idea for a slight variation of the configuration described there which does appear to work in both use cases. The main modification that I made was to separate the build operation into two separate build operations: the first is a simple Windows command line call which overrides BUILD_ID and then launches mspdbsrv.exe. Once this first operation completes, Jenkins terminates the shell session that is linked to the pdbsrv process thus decoupling it from the agent. Combined with the overloaded BUILD_ID env var, Jenkins can no longer track the process. the second operation is just another instance of a Windows shell session that then calls into msbuild to proceed with the build. Theoretically even this solution "could" fall prey to the same problem I described in my previous comment, however the execution time of this initial build step is negligible and is highly unlikely to be exploited in practice (ie: a user would need to hit the kill button on the build at just that small fraction of a second it takes Jenkins to launch mspdbsrv.exe). I'm not sure how easy this hack will be for us to roll out into production at the scale we need, but just in case others find this tidbit of information helpful I thought I'd provide it here.

            People

            • Assignee:
              danielweber Daniel Weber
              Reporter:
              gordin Christoph VogtlÃĪnder
            • Votes:
              38 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

              • Created:
                Updated: