Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9104

Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed

    Details

    • Type: Bug
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
      None
    • Environment:
      Windows XP, Windows 7 using MSBuild or devenv.exe to build MS Visual Studio Projects
    • Similar Issues:

      Description

      I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
      This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].

      I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
      I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.

      Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.


      [1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
      [2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/

        Attachments

          Issue Links

            Activity

            Hide
            andne Andy Neebel added a comment -

            We recently re-encountered this on our build network and I did some investigation, here's what I found:

            • On the master node, the veto from MSBuild plugin works properly, I was able to confirm the log message show it.
            • On a slave node, I do not see the log message from the veto. Instead I see a message that my process is being killed recursively (I was watching the process list to get the id during the build).

            It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this.

            Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?

            Show
            andne Andy Neebel added a comment - We recently re-encountered this on our build network and I did some investigation, here's what I found: On the master node, the veto from MSBuild plugin works properly, I was able to confirm the log message show it. On a slave node, I do not see the log message from the veto. Instead I see a message that my process is being killed recursively (I was watching the process list to get the id during the build). It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this. Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?
            Hide
            andne Andy Neebel added a comment -

            I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners

            • getVeto() is how the whitelist extension is accessed to block the killing of the process. This function just gets the list as it exists, no attempt to go ask the master for any information.
            • getKillers() is used to access the list of ProcessKillers if there are any classes implementing that extension point. This function gets the channel back to the master so it can ask for the master's list of classes implementing this extension.

            I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.

            Show
            andne Andy Neebel added a comment - I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners getVeto() is how the whitelist extension is accessed to block the killing of the process. This function just gets the list as it exists, no attempt to go ask the master for any information. getKillers() is used to access the list of ProcessKillers if there are any classes implementing that extension point. This function gets the channel back to the master so it can ask for the master's list of classes implementing this extension. I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.
            Hide
            walteste Stefan Walter added a comment -

            Is there any workaround to this issue, because it completely breaks our usage of Jenkins?

            Show
            walteste Stefan Walter added a comment - Is there any workaround to this issue, because it completely breaks our usage of Jenkins?
            Hide
            grillba Mark Grills added a comment -

            Hi Stefan, refer my comments above. This fixed it for us. Cheers

            Show
            grillba Mark Grills added a comment - Hi Stefan, refer my comments above. This fixed it for us. Cheers
            Hide
            walteste Stefan Walter added a comment -

            Hi Mark Grills, thanks a lot for your suggestion. It seems that this solved our issues.

            Show
            walteste Stefan Walter added a comment - Hi Mark Grills , thanks a lot for your suggestion. It seems that this solved our issues.

              People

              • Assignee:
                danielweber Daniel Weber
                Reporter:
                gordin Christoph Vogtländer
              • Votes:
                65 Vote for this issue
                Watchers:
                84 Start watching this issue

                Dates

                • Created:
                  Updated: