Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54974

Jenkins does not start due to a deadlock after upgrade from 2.121.2.2 to 2.138.2.2

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      Jenkns Core 2.163

      Description

      Jenkins does not start due to a deadlockJenkins does not start due to a deadlock
      The issue we are facing is very similar to JENKINS-49038.We have upgraded Jenkins instance from 2.121 to 2.138.2.2.The instance service starts normally, but the UI is loading infinitely long.At startup we get the deadlock

      // output
      
      "PreventRefreshFilter.initAutoRefreshFilter" #57 daemon prio=5 os_prio=0 tid=0x00007fdb5c02f800 nid=0x58ad waiting for monitor entry [0x00007fdb20193000]   java.lang.Thread.State: BLOCKED (on object monitor) at hudson.ExtensionList.ensureLoaded(ExtensionList.java:317) - waiting to lock <0x00000006c0120260> (a hudson.ExtensionList$Lock) at hudson.ExtensionList.getComponents(ExtensionList.java:183) at hudson.DescriptorExtensionList.load(DescriptorExtensionList.java:192) at hudson.ExtensionList.ensureLoaded(ExtensionList.java:318) - locked <0x00000006c37c7680> (a hudson.DescriptorExtensionList) at hudson.ExtensionList.iterator(ExtensionList.java:172) at hudson.ExtensionList.get(ExtensionList.java:149) at hudson.plugins.claim.ClaimConfig.get(ClaimConfig.java:202) at hudson.plugins.claim.http.PreventRefreshFilter.initAutoRefreshFilter(PreventRefreshFilter.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at jenkins.model.Jenkins$5.runTask(Jenkins.java:1069) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      
      

      The deadlock seems to be intermittent, i.e. when stopping and starting the instance, it may finally start 2 times of 10.The issue can not be reproduced on a clean instance without custom plugins (only default plugins installed).

        Attachments

          Issue Links

            Activity

            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Not sure it is specifically related to Claim Plugin, the API usage looks to be valid

            Kirill Gostaf please provide a full thread dump

             

             

            Show
            oleg_nenashev Oleg Nenashev added a comment - Not sure it is specifically related to Claim Plugin, the API usage looks to be valid Kirill Gostaf please provide a full thread dump    
            Hide
            schneeheld Kirill Gostaf added a comment -

            Hi Oleg,

            As per your comment, I've attached report_threads1.log

            In addition, removing the claim plugin does not solve the issue. Startup then complained about the Radiator view plugin dependency on claim.

            Disabling the Radiator view plugin does not help. There was still a deadlock after having claim removed and radiatorviewplugin disabled.

            Show
            schneeheld Kirill Gostaf added a comment - Hi Oleg, As per your comment, I've attached report_threads1.log In addition, removing the claim plugin does not solve the issue. Startup then complained about the Radiator view plugin dependency on claim. Disabling the Radiator view plugin does not help. There was still a deadlock after having claim removed and radiatorviewplugin disabled.
            Hide
            greybird Arnaud TAMAILLON added a comment - - edited

            Hi Oleg Nenashev (and Daniel Beck as Oleg is stepping back at the moment from Core maintenance). 

            From my analysis, and some of other issues reported speaking about deadlocks (JENKINS-20988JENKINS-21034JENKINS-31622JENKINS-44564JENKINS-49038JENKINS-50663), the issue lies in the DescriptorExtensionList, especially in the way it acquires its load lock.
            The DescriptorExtensionList getLoadLock method documentation indicates that it is taking part in the real load activity, and that as such, it can lock on *this *rather than on the *singleton Lock *used by ExtensionList.

            However, many plugins rely on a GlobalConfiguration object, which is acquired through a code similar to the following (which is actually explicitly recommended in GlobalConfiguration documentation).

            public static SpecificPluginConfig get() {
                return GlobalConfiguration.all().get(SpecificPluginConfig.class);
            }
            

            (the all() method from the GlobalConfiguration is returning a DescriptorExtensionList)

            As the configuration for a plugin can be called from many places (initialization of plugin, http requests, ...), it is very easy to have at the same time a DescriptorExtensionList being instantiated, needing in return an ExtensionList, while at the same time, some injection code will have taken the ExtensionList lock and will require DescriptorExtensionList one.
            Of course, some other uses of DescriptorExtensionList, not related to GlobalConfiguration, can also create the same kind of issues.

            Taking in to account that the lock is only taken when the list is initialized for the first time (in ensureLoaded()), I would say that removing the override of getLoadLock in DescriptorExtensionList should solve the issue at a very minimal cost, or at least make the lock the same as ExtensionList for Descriptor.class

            What do you think about this proposal ? Do you see other unintended consequences ?

            Show
            greybird Arnaud TAMAILLON added a comment - - edited Hi  Oleg Nenashev  (and Daniel Beck as Oleg is stepping back at the moment from Core maintenance).  From my analysis, and some of other issues reported speaking about deadlocks ( JENKINS-20988 ,  JENKINS-21034 ,  JENKINS-31622 ,  JENKINS-44564 ,  JENKINS-49038 ,  JENKINS-50663 ), the issue lies in the  DescriptorExtensionList , especially in the way it acquires its load lock. The DescriptorExtensionList getLoadLock method documentation indicates that it is taking part in the real load activity, and that as such, it can lock on *this *rather than on the *singleton Lock *used by ExtensionList. However, many plugins rely on a GlobalConfiguration object, which is acquired through a code similar to the following (which is actually explicitly recommended in GlobalConfiguration documentation ). public static SpecificPluginConfig get() { return GlobalConfiguration.all().get(SpecificPluginConfig.class); } (the all() method from the GlobalConfiguration is returning a DescriptorExtensionList ) As the configuration for a plugin can be called from many places (initialization of plugin, http requests, ...), it is very easy to have at the same time a DescriptorExtensionList being instantiated, needing in return an ExtensionList, while at the same time, some injection code will have taken the ExtensionList lock and will require DescriptorExtensionList one. Of course, some other uses of DescriptorExtensionList, not related to GlobalConfiguration, can also create the same kind of issues. Taking in to account that the lock is only taken when the list is initialized for the first time (in ensureLoaded() ), I would say that removing the override of getLoadLock in DescriptorExtensionList should solve the issue at a very minimal cost, or at least make the lock the same as ExtensionList for Descriptor.class What do you think about this proposal ? Do you see other unintended consequences ?
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            It should be fixed by the patch from Arnaud TAMAILLON in 2.163

            Show
            oleg_nenashev Oleg Nenashev added a comment - It should be fixed by the patch from Arnaud TAMAILLON in 2.163

              People

              • Assignee:
                greybird Arnaud TAMAILLON
                Reporter:
                schneeheld Kirill Gostaf
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: