Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60430

stopping a job while in reconcile will not stop the reconcile (the job itself will however stop)

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: p4-plugin
    • Labels:
      None
    • Similar Issues:

      Description

      We had setup our sync to use

      AutoCleanImpl

      As the sync method for our project but noticed that when we stopped a job that was in the process of reconcile the job in Jenkins would stop normally but the agent would carry on with the reconcile.

      Here is a stack trace obtained by monitoring an agent that was effectively idle on which we had just cancelled a job during a reconcile :

      pool-1-thread-146 for JNLP4-connect connection to jenkis-server-url/10.144.6.28:20555 id=17503
      java.io.WinNTFileSystem.list(Native Method)
      java.io.File.list(File.java:1134)
      java.io.File.listFiles(File.java:1219)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:726)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.reconcileAdd(ClientSystemFileMatchCommands.java:588)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientFunctionDispatcher.dispatch(ClientFunctionDispatcher.java:220)
      com.perforce.p4java.impl.mapbased.rpc.packet.RpcPacketDispatcher.dispatch(RpcPacketDispatcher.java:160)
      com.perforce.p4java.impl.mapbased.rpc.OneShotServerImpl.execMapCmdList(OneShotServerImpl.java:363)
      com.perforce.p4java.impl.mapbased.rpc.OneShotServerImpl.execStreamingMapCommand(OneShotServerImpl.java:428)
      com.perforce.p4java.impl.mapbased.client.Client.reconcileFiles(Client.java:1806)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyClean(ClientHelper.java:570)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyAutoCleanImpl(ClientHelper.java:492)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyWorkspace(ClientHelper.java:436)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.task(CheckoutTask.java:163)
      org.jenkinsci.plugins.p4.tasks.AbstractTask.retryTask(AbstractTask.java:202)
      org.jenkinsci.plugins.p4.tasks.AbstractTask.tryTask(AbstractTask.java:185)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.invoke(CheckoutTask.java:157)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.invoke(CheckoutTask.java:32)
      hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052)
      hudson.remoting.UserRequest.perform(UserRequest.java:212)
      hudson.remoting.UserRequest.perform(UserRequest.java:54)
      hudson.remoting.Request$2.run(Request.java:369)
      hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      java.util.concurrent.FutureTask.run(FutureTask.java:264)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
      hudson.remoting.Engine$1$$Lambda$68/0x00000008001cd840.run(Unknown Source)
      java.lang.Thread.run(Thread.java:834)**

      Starting a new job on this agent would yield conflict as the agent would have file handle on some of the files while the next job would also attempt to perform a reconcile.
      ERROR: P4: Task Exception: com.perforce.p4java.exception.P4JavaException: com.perforce.p4java.exception.P4JavaException: hudson.AbortException: P4JAVA: Error(s):11:41:32 operating system will not allow deletion of file e:\jwrk\stg\workspace\XXXXX-Win64-Mono\Assets_Game\Environments\Architecture\Mesh\Keyhole_Tower_01\Base\Floor_Base_Interior_01\.Sources\Floor_Base_Interior_01.ZTL on client.
      when looking for this handle on the node we could find that the agent process itself was holding on to that file.

      Lastly we did notice that some tmp files were left in the source tree indicative of a perforce operation that did not clean itself properly.

       We worked around the problem by not doing reconciles anymore at all, we are force syncing systematically.  Which is faster anyways for large projects.

        Attachments

          Activity

          Hide
          rpetti Rob Petti added a comment -

          perforce-plugin is deprecated, and it looks like you are using p4-plugin anyway. Please double check the plugin name before filing tickets in the future.

          Show
          rpetti Rob Petti added a comment - perforce-plugin is deprecated, and it looks like you are using p4-plugin anyway. Please double check the plugin name before filing tickets in the future.
          Hide
          p4karl Karl Wirth added a comment -

          Hi Eric Daigneault - Thanks for letting us know about this. Please let me know which version of the plugin you are using and which version of P4D it is connected to.

          Show
          p4karl Karl Wirth added a comment - Hi Eric Daigneault - Thanks for letting us know about this. Please let me know which version of the plugin you are using and which version of P4D it is connected to.
          Hide
          newtopian Eric Daigneault added a comment -

          of course :

          Jenkins 2.190.3 on Centos 7 with OpenJDK Runtime Environment, 1.8.0_232-b09

          Agent on Windows 10 on openjdk-hotspot-win64-11.0.4-11

          P4 plugin 1.10.7

          p4d :

          Server date: 2019/12/11 09:50:08 -0500 EST
          Server uptime: 936:31:10
          Server version: P4D/LINUX26X86_64/2019.1/1876401 (2019/10/30)

          Show
          newtopian Eric Daigneault added a comment - of course : Jenkins 2.190.3 on Centos 7 with OpenJDK Runtime Environment, 1.8.0_232-b09 Agent on Windows 10 on openjdk-hotspot-win64-11.0.4-11 P4 plugin 1.10.7 p4d : Server date: 2019/12/11 09:50:08 -0500 EST Server uptime: 936:31:10 Server version: P4D/LINUX26X86_64/2019.1/1876401 (2019/10/30)
          Hide
          p4karl Karl Wirth added a comment -

          Hi Eric Daigneault - Thanks.

          For not stopping the reconcile I have tried some testing here and I think this is more a problem on the server than with p4-plugin. Perforce commands run in a loop and when the client side connection drops they need to get to a safe point in the command before they check if the network connection is still there and die if needed. This is usually when the database locks are released or at the end of processing an argument.

          If I kill (CTRL+C) a reconcile that takes 30 seconds run using P4 at the command line, I still see the command in p4 monitor for about 25 seconds in total.

          If I kill the Jenkins job I see similar timings (21 seconds approx) however I think it's easier to be accurate about the time you killed a command at the command line and more difficult to be consistent via Jenkins.

          So from this testing I think the command line and p4-plugin behavior is comparable.

           

          For the file handle, do you know if the file handle was released when the reconcile completed?

          Also why did you need to workaround it? Was killing the jobs a frequent need?

          I have tested here and was not able to reproduce the problem on Windows 10 but it may be related to the types/sizes of files etc. Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?

           

          For the p4j*.tmp files, is it possible they were from an earlier plugib version? They used to be created and sometimes not cleaned up when using symlinks. That should have been fixed in 1.10.6:

              https://github.com/jenkinsci/p4-plugin/blob/master/RELEASE.md

          Show
          p4karl Karl Wirth added a comment - Hi Eric Daigneault - Thanks. For not stopping the reconcile I have tried some testing here and I think this is more a problem on the server than with p4-plugin. Perforce commands run in a loop and when the client side connection drops they need to get to a safe point in the command before they check if the network connection is still there and die if needed. This is usually when the database locks are released or at the end of processing an argument. If I kill (CTRL+C) a reconcile that takes 30 seconds run using P4 at the command line, I still see the command in p4 monitor for about 25 seconds in total. If I kill the Jenkins job I see similar timings (21 seconds approx) however I think it's easier to be accurate about the time you killed a command at the command line and more difficult to be consistent via Jenkins. So from this testing I think the command line and p4-plugin behavior is comparable.   For the file handle, do you know if the file handle was released when the reconcile completed? Also why did you need to workaround it? Was killing the jobs a frequent need? I have tested here and was not able to reproduce the problem on Windows 10 but it may be related to the types/sizes of files etc. Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?   For the p4j*.tmp files, is it possible they were from an earlier plugib version? They used to be created and sometimes not cleaned up when using symlinks. That should have been fixed in 1.10.6:     https://github.com/jenkinsci/p4-plugin/blob/master/RELEASE.md
          Hide
          p4karl Karl Wirth added a comment -

          Hi Eric Daigneault - I was going through my old cases and saw that this one is still open. Are you able to answer the questions above? Thanks in advance.

          Karl

          Show
          p4karl Karl Wirth added a comment - Hi Eric Daigneault - I was going through my old cases and saw that this one is still open. Are you able to answer the questions above? Thanks in advance. Karl
          Hide
          newtopian Eric Daigneault added a comment -

          Hi Karl,

          For the file handle, do you know if the file handle was released when the reconcile completed?

          The handle was not released and caused the next job to fail on sync as the next reconcile was trying to repair the file

           

          Also why did you need to workaround it? Was killing the jobs a frequent need?

          It is not a frequent need no, but anything that affects the next build is considered a blocking issue.  Getting builds jobs must be independent of each-other unless explicitly specified in the job's config.  Hence the workaround.  Besides the clean option is really not practical beyond the simplest hello-world project.  on very large projects a reconcile will take longer than a wipe and re-sync or just a force-sync which is the option we are currently using as a replacement (force sync with the occasional wipe and re-sync).  It is not as clean but we gained some precious minutes in the build process..

           

          For the p4j*.tmp files, is it possible they were from an earlier plugib version?

          it's possible but that would mean that the p4clean is not doing it's job as there have  been a great many jobs run on these machines since the last update.

           

          Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?

          No it was a normal file (zbrush I beleive) weighing around 50MB.  As we are mostly under windows stay clear of symlinks and such, such a pain under windows !

          Show
          newtopian Eric Daigneault added a comment - Hi Karl, For the file handle, do you know if the file handle was released when the reconcile completed? The handle was not released and caused the next job to fail on sync as the next reconcile was trying to repair the file   Also why did you need to workaround it? Was killing the jobs a frequent need? It is not a frequent need no, but anything that affects the next build is considered a blocking issue.  Getting builds jobs must be independent of each-other unless explicitly specified in the job's config.  Hence the workaround.  Besides the clean option is really not practical beyond the simplest hello-world project.  on very large projects a reconcile will take longer than a wipe and re-sync or just a force-sync which is the option we are currently using as a replacement (force sync with the occasional wipe and re-sync).  It is not as clean but we gained some precious minutes in the build process..   For the p4j*.tmp files, is it possible they were from an earlier plugib version? it's possible but that would mean that the p4clean is not doing it's job as there have  been a great many jobs run on these machines since the last update.   Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction? No it was a normal file (zbrush I beleive) weighing around 50MB.  As we are mostly under windows stay clear of symlinks and such, such a pain under windows !

            People

            • Assignee:
              Unassigned
              Reporter:
              newtopian Eric Daigneault
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: