Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59400

Jenkins slave nodes hangs for up to 12+ minutes after build phase completes

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: core
    • Environment:
    • Similar Issues:

      Description

      Jenkins slave nodes hangs for up to 12+ minutes after build phase completes.

      I have a master node and 7 nodes in the build pool. The master is configured to only run jobs labeled master. Builds are always only done from the build nodes.

      There are no post-build steps configured.

      Hang/delay is observed if the build step passes and/or fails.

      Example Console Output follows
      ---------------------------------------------------------------------------
      ....
      2019-09-16 16:51:45 ERROR: last command returned failure: 1
      2019-09-16 16:51:45 
      2019-09-16 16:51:45 build.bat failed with error code '1'
      2019-09-16 16:51:45 
      2019-09-16 16:51:45 Build step 'Execute Windows batch command' marked build as failure
      2019-09-16 17:03:48 Finished: FAILURE
      --------------------------------------------------------------------------- 

      Observe that at 16:51:45 the Build step for Execute Windows batch command finishes. However the build step continues for another 12 minutes before finally completed with the FAILURE notification.

      There are 6 nodes in the build pool being triggered for the builds

        Attachments

          Activity

          Hide
          rocha_stratovan John Rocha added a comment -

          Update:

          The issue doesn't always happen. It seems to depend upon what is being done in the job. For example, if the job is to update perforce there is no noticeable delay. If it's to do a simple compile using Visual Studio there doesn't seem to be a delay for that either. By simple compile I mean few objects that don't seem to trigger parallel compilation.

          When it does happen it appears to be with bigger Visual Studio builds that have parallel compilation enabled. Moreover, I've noticed that there may be multiple MSBuild.exe processes still running even after Jenkins reports "build.bat existing with success"

          For example, during my most recent reproduction, there were 5 MSBuild.exe processes lingering after Jenkins reported the script exited with success, but the build didn't return the final result until ~8m later.

          The MSBuild.exe processes would slowly go away one by one.

          Once all of the MSBuild.exe processes terminated, the Jenkins job reported it's final "Finished: SUCCESS" result.

          Show
          rocha_stratovan John Rocha added a comment - Update: The issue doesn't always happen. It seems to depend upon what is being done in the job. For example, if the job is to update perforce there is no noticeable delay. If it's to do a simple compile using Visual Studio there doesn't seem to be a delay for that either. By simple compile I mean few objects that don't seem to trigger parallel compilation. When it does happen it appears to be with bigger Visual Studio builds that have parallel compilation enabled. Moreover, I've noticed that there may be multiple MSBuild.exe processes still running even after Jenkins reports "build.bat existing with success" For example, during my most recent reproduction, there were 5 MSBuild.exe processes lingering after Jenkins reported the script exited with success, but the build didn't return the final result until ~8m later. The MSBuild.exe processes would slowly go away one by one. Once all of the MSBuild.exe processes terminated, the Jenkins job reported it's final " Finished: SUCCESS " result.
          Hide
          rocha_stratovan John Rocha added a comment -

          Root cause - User calling error

          I ran the script manually from the CLI and observed that the MSBuild.exe processes never went away. Ever.

          I Googled for this and found this stackoverflow description/solution

          If parallel compiles are enabled and used, the default is for the MSBuild.exe process to stay around so it can be re-used by future compiles. This seems to cause a problem with the remote jenkins build pools.

          The MSBuild.exe reuse/linger functionality can be disabled by passing /nr:false for the build process.

          When I added this flag it resolved my issue.

          This problem doesn't happen if I am building without a build pool (i.e. one jenkins node that does it's own compiles). It only occurs when I go to a master/slave-build-pool scenario. Then it occurs when building on the slave nodes.

          Show
          rocha_stratovan John Rocha added a comment - Root cause - User calling error I ran the script manually from the CLI and observed that the MSBuild.exe processes never went away. Ever. I Googled for this and found this stackoverflow description/solution If parallel compiles are enabled and used, the default is for the MSBuild.exe process to stay around so it can be re-used by future compiles. This seems to cause a problem with the remote jenkins build pools. The MSBuild.exe reuse/linger functionality can be disabled by passing /nr:false for the build process. When I added this flag it resolved my issue. This problem doesn't happen if I am building without a build pool (i.e. one jenkins node that does it's own compiles). It only occurs when I go to a master/slave-build-pool scenario. Then it occurs when building on the slave nodes.

            People

            • Assignee:
              Unassigned
              Reporter:
              rocha_stratovan John Rocha
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: