Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51377

High Thread contention around WorkflowRun

    Details

    • Similar Issues:

      Description

      Builds and Queue maintenance are slowed down because operations require a lock on WorkflowRun to obtain the execution due to lazy-loading, and that lock can be contended heavily as a result. Normally the lock is only briefly held, but the save operation can hold it for a longer period.

      On normal systems the impact should be fairly small but with highly concurrent Pipelines it may result in many threads blocked due to usages like the below (and other within-step operations):

      java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:844)
      	- waiting to lock <0x00000004ef316030> (a org.jenkinsci.plugins.workflow.job.WorkflowRun)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:1100)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:213)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:95)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:89)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.run(ExecutorStepExecution.java:409)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:418)
      
      at hudson.util.XStream2.toXMLUTF8(XStream2.java:310)
      	at org.jenkinsci.plugins.workflow.support.PipelineIOUtils.writeByXStream(PipelineIOUtils.java:34)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.save(WorkflowRun.java:1256)
      	- locked <0x00000004ef316030> (a org.jenkinsci.plugins.workflow.job.WorkflowRun)
      	at hudson.BulkChange.commit(BulkChange.java:98)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1447)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:417)
      

        Attachments

          Activity

          svanoort Sam Van Oort created issue -
          svanoort Sam Van Oort made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          svanoort Sam Van Oort added a comment - - edited

          Proposed solution: switch to using a ReadWriteLock internally within WorkflowRun to ensure thread-safety of access , and find a way to avoid normal synchronization while obtaining the transient Lock object – perhaps initializing it at deserialization-time to ensure we don't need to rely on a synchronized lazily-initializing getter.

          This should replace the "copyLogsGuard" synchronization object and general synchronization on the WorkflowRun (except for general synchronization during the save operation).

          The catch: we need to verify there are no deadlocks between synchronization on the WorkflowRun and the guard object – we need to ensure synchronization has one consistent order (either run-first or readwriteLock-first).

          Synchronization in the Run superclass:

          • #pickArtifactManager()
          • #deleteArtifacts()
          • #delete() – partial
          • #save()
          • #doSubmitDescription(StaplerRequest, StaplerResponse)
          Show
          svanoort Sam Van Oort added a comment - - edited Proposed solution: switch to using a ReadWriteLock internally within WorkflowRun to ensure thread-safety of access , and find a way to avoid normal synchronization while obtaining the transient Lock object – perhaps initializing it at deserialization-time to ensure we don't need to rely on a synchronized lazily-initializing getter. This should replace the "copyLogsGuard" synchronization object and general synchronization on the WorkflowRun (except for general synchronization during the save operation). The catch: we need to verify there are no deadlocks between synchronization on the WorkflowRun and the guard object – we need to ensure synchronization has one consistent order (either run-first or readwriteLock-first). Synchronization in the Run superclass: #pickArtifactManager() #deleteArtifacts() #delete() – partial #save() #doSubmitDescription(StaplerRequest, StaplerResponse)
          svanoort Sam Van Oort made changes -
          Status In Progress [ 3 ] Open [ 1 ]
          rsandell rsandell made changes -
          Assignee rsandell [ rsandell ]
          svanoort Sam Van Oort made changes -
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          svanoort Sam Van Oort made changes -
          Summary Improve thread contention around WorkflowRun High Thread contention around WorkflowRun
          svanoort Sam Van Oort made changes -
          Labels performance
          Hide
          svanoort Sam Van Oort added a comment -

          Noted to result in 10+ threads blocked for one of the major users.

          Show
          svanoort Sam Van Oort added a comment - Noted to result in 10+ threads blocked for one of the major users.
          Hide
          svanoort Sam Van Oort added a comment -

          WorkflowRun#save() will probably need to grab both a whole-Run lock (for safety if other things are mutating it) and a writeLock

          Show
          svanoort Sam Van Oort added a comment - WorkflowRun#save() will probably need to grab both a whole-Run lock (for safety if other things are mutating it) and a writeLock
          svanoort Sam Van Oort made changes -
          Labels performance performance regression
          cloudbees CloudBees Inc. made changes -
          Remote Link This issue links to "CloudBees Internal CD-555 (Web Link)" [ 20704 ]
          Hide
          svanoort Sam Van Oort added a comment -

          Alternative simpler option for rsandell: make the execution field (and the internal lock object, etc) volatile and use The Only Safe Form of double-checked locking in Java – since accesses are 99% read/1% write that gives us most of the benefits of a ReadWriteLock but with the excess complexity.

          Bonus: it only increases complexity in the initializing getter methods (i.e. getExecution etc).

          Show
          svanoort Sam Van Oort added a comment - Alternative simpler option for rsandell : make the execution field (and the internal lock object, etc) volatile and use The Only Safe Form of double-checked locking in Java – since accesses are 99% read/1% write that gives us most of the benefits of a ReadWriteLock but with the excess complexity. Bonus: it only increases complexity in the initializing getter methods (i.e. getExecution etc).
          svanoort Sam Van Oort made changes -
          Assignee rsandell [ rsandell ] Sam Van Oort [ svanoort ]
          Hide
          svanoort Sam Van Oort added a comment -

          Taking this one on because I have a lot of in-progress for it, it turns out

          Show
          svanoort Sam Van Oort added a comment - Taking this one on because I have a lot of in-progress for it, it turns out
          Hide
          petehayes Peter Hayes added a comment -

          Any progress update? 

          Show
          petehayes Peter Hayes added a comment - Any progress update? 
          svanoort Sam Van Oort made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          svanoort Sam Van Oort made changes -
          Status In Progress [ 3 ] In Review [ 10005 ]
          Hide
          svanoort Sam Van Oort added a comment -

          Peter Hayes Yes, it's going into review now – I'm satisfied enough with the testing.

          Show
          svanoort Sam Van Oort added a comment - Peter Hayes Yes, it's going into review now – I'm satisfied enough with the testing.
          svanoort Sam Van Oort made changes -
          Remote Link This issue links to "workflow-job PR #97 (Web Link)" [ 20836 ]
          Hide
          petehayes Peter Hayes added a comment -

          Thanks!

          Show
          petehayes Peter Hayes added a comment - Thanks!
          petehayes Peter Hayes made changes -
          Attachment image-2018-06-18-09-38-33-398.png [ 42906 ]
          Hide
          petehayes Peter Hayes added a comment -

          We are experiencing this issue pretty severely today.  Here is a screenshot of the currently running web requests all blocked on this issue causing very high latency to our users.  Also our Jenkins rebooted today which could be related if things got further and further backed up.

           

          Show
          petehayes Peter Hayes added a comment - We are experiencing this issue pretty severely today.  Here is a screenshot of the currently running web requests all blocked on this issue causing very high latency to our users.  Also our Jenkins rebooted today which could be related if things got further and further backed up.  
          Hide
          svanoort Sam Van Oort added a comment -

          Released as workflow-job 2.22

          Show
          svanoort Sam Van Oort added a comment - Released as workflow-job 2.22
          svanoort Sam Van Oort made changes -
          Status In Review [ 10005 ] Closed [ 6 ]
          Resolution Done [ 10000 ]
          Hide
          svanoort Sam Van Oort added a comment -

          Peter Hayes  It is in release now – we had a couple extra holdups to resolve.   Please give it a try and let us know how it performs – we've done some benchmarking but for complex changes like this it is not always easy to determine what the impact will be for individual setups.

          Show
          svanoort Sam Van Oort added a comment - Peter Hayes   It is in release now – we had a couple extra holdups to resolve.   Please give it a try and let us know how it performs – we've done some benchmarking but for complex changes like this it is not always easy to determine what the impact will be for individual setups.
          Hide
          petehayes Peter Hayes added a comment -

          Great.  Thanks Sam.

          Show
          petehayes Peter Hayes added a comment - Great.  Thanks Sam.
          Hide
          petehayes Peter Hayes added a comment -

          We have installed this and we are not seeing the high contention any longer.  Thanks!

          Show
          petehayes Peter Hayes added a comment - We have installed this and we are not seeing the high contention any longer.  Thanks!

            People

            • Assignee:
              svanoort Sam Van Oort
              Reporter:
              svanoort Sam Van Oort
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: