Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23271

Intermittent Invalid Object ID in remoting module

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: remoting
    • Environment:
      1.544.2, 1.563, 1.625.1 - Windows or Linux Master, various slaves
    • Similar Issues:

      Description

      For the last two weeks, I have been seeing an intermittent failure during the clearcase stage of our builds. It throws a Java exception along the lines of

      FATAL: Invalid object ID 18649 iuota=18470
      java.lang.IllegalStateException: Invalid object ID 18469 iota=18470
      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:277)
      etc...

      It always happens during the Clearcase segment of the build, but on a different step each time. Sometimes after lsview, sometimes after lshistory, sometimes after mkview ...

      It happens a very low percentage of the time, and it happens on a different project each time. I have around 300 projects running between a number of matrix jobs- we are seeing one or two failures on each runthrough of this, always on different jobs and on different slaves. It started a couple weeks ago- our clearcase server crashed, and jenkins was having some trouble interacting with the server. I upgraded to 1.563 to try and solve that problem (I wound up having to manually delete all the views that had been created in the period between the backup and the crash). Once that issue had been resolved though, we started seeing this problem.

      I upgraded the clearcase plugin to 1.5, and then tried downgrading Jenkins to 1.554.2. I cannot swear to it, but I thought I was on 1.558 before the first problem popped up, and had been very stable for a long time with all build failures being attributed to bad code, so suddenly having Jenkins fail some is kind of jarring.

      At this point, I am not sure what I changed to cause the problem. I remember I had to upgrade slave.jar to version 2.41 from whatever it was before when I did the first upgrade to 1.563. Other then that, nothing else substantial has changed. The Jenkins is on a restricted network with no internet access- I am going to type the stack trace below, so forgive any typos.

      FATAL: Invalid object ID 18649 iuota=18470
      java.lang.IllegalStateException: Invalid object ID 18469 iota=18470
      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:277)
      at hudson.remoting.ExportTable.get(ExportTable.java:261)
      at hudson.remoting.Channel.getExportedObject(Channel.java:596)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:292
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:281)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:240)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(UNKNOWN SOURCE)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Cased by: java.lang.Exception: Object was recently deallocated
      #18649 (ref.0) : hudson.Launcher$RemoteLaunchCallable$1
      created at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.<init>(ExportTable.java:86)
      at hudson.remoting.ExportTable.export(ExportTable.java:239)
      at hudson.remoting.Channel.export(Channel.java:592)
      at hudson.remoting.Channel.export(Channel.java:583)
      at hudson.remoting.Channel.export(Channel.java:553)
      at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:1025)
      at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:990)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Released at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.release(ExportTable.java:115)
      at hudson.remoting.ExportTable.unexportByOid(ExportTable.java:303)
      at hudson.remoting.Channel.unexport(Channel.java:600)
      at hudson.remoting.UnexportCommand.execute(UnexportCommand.java:38)
      at hudson.remoting.Channel$2.handle(Channel.java:475)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)

      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:270)
      ... 15 more
      Caused by: Released at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.release(ExportTable.java:115)
      at hudson.remoting.ExportTable.unexportByOid(ExportTable.java:303)
      at hudson.remoting.Channel.unexport(Channel.java:600)
      at hudson.remoting.UnexportCommand.execute(UnexportCommand.java:38)
      at hudson.remoting.Channel$2.handle(Channel.java:475)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)

        Attachments

          Issue Links

            Activity

            Hide
            apgray Andrew Gray added a comment -

            Should this now be resolved with release of 2.35?

            Show
            apgray Andrew Gray added a comment - Should this now be resolved with release of 2.35?
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            At least more failure cases should be solved

            Show
            oleg_nenashev Oleg Nenashev added a comment - At least more failure cases should be solved
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            core/src/main/java/hudson/Launcher.java
            http://jenkins-ci.org/commit/jenkins/fcfd271ad5ad5947075052d37177249c760f6184
            Log:
            JENKINS-23271 - Prevent early deallocation of the Proc instance by GC in ProcStarter#join() (#2635)

            • JENKINS-23271 - Prevent the prelimimary deallocation of the Proc instance by GC

            It is a hackish way, which likely prevents a preliminary deallocation of the spawned RemoteProc instance, which we see in JENKINS-23271. Proc instance was not actually required in the original code since we were creating and using RemoteInvocationHandler wrapper only, and the theory discussed with @stephenc was that object gets removed by Java8 garbage collector before we get into join().

            This fix enforces the persistency of ProcStarter#start() result by adding logging and the enforced volatile field (maybe the last one is not really required, but JIT compiler in Java implementations may be smart enough to skip unused loggers)

            This is a pretty old fix from August, which has been soak tested on my instance for several weeks (mid-August => Jenkins World). On the reference instance (just a small Jenkins instance with 4 agents and very frequent builds with CommandInterpreter steps) I saw 2 failures over the period. On the fixed instance - 0. It does not proof anything, but at least the fix was soak tested a bit

            • JENKINS-23271 - Also put the check into the finally statement as @stephenc proposed
            • Remove assert

            (cherry picked from commit fd6c6aff929be9818f4eb4b84ed6b4593356853f)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java http://jenkins-ci.org/commit/jenkins/fcfd271ad5ad5947075052d37177249c760f6184 Log: JENKINS-23271 - Prevent early deallocation of the Proc instance by GC in ProcStarter#join() (#2635) JENKINS-23271 - Prevent the prelimimary deallocation of the Proc instance by GC It is a hackish way, which likely prevents a preliminary deallocation of the spawned RemoteProc instance, which we see in JENKINS-23271 . Proc instance was not actually required in the original code since we were creating and using RemoteInvocationHandler wrapper only, and the theory discussed with @stephenc was that object gets removed by Java8 garbage collector before we get into join(). This fix enforces the persistency of ProcStarter#start() result by adding logging and the enforced volatile field (maybe the last one is not really required, but JIT compiler in Java implementations may be smart enough to skip unused loggers) This is a pretty old fix from August, which has been soak tested on my instance for several weeks (mid-August => Jenkins World). On the reference instance (just a small Jenkins instance with 4 agents and very frequent builds with CommandInterpreter steps) I saw 2 failures over the period. On the fixed instance - 0. It does not proof anything, but at least the fix was soak tested a bit JENKINS-23271 - Get rid of the procHolderForJoin field JENKINS-23271 - Also put the check into the finally statement as @stephenc proposed Remove assert (cherry picked from commit fd6c6aff929be9818f4eb4b84ed6b4593356853f)
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            core/src/main/java/hudson/Launcher.java
            core/src/main/java/hudson/Proc.java
            http://jenkins-ci.org/commit/jenkins/5500e5c00bc19e6c421c9aaa88f6dbb57c666db3
            Log:
            JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653)

            • JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods
            • JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence

            (cherry picked from commit 29893354a52efac20efcb4e0e91723f3bd889059)

            Compare: https://github.com/jenkinsci/jenkins/compare/dacbe03e0d4e...5500e5c00bc1

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/Proc.java http://jenkins-ci.org/commit/jenkins/5500e5c00bc19e6c421c9aaa88f6dbb57c666db3 Log: JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653) JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence (cherry picked from commit 29893354a52efac20efcb4e0e91723f3bd889059) Compare: https://github.com/jenkinsci/jenkins/compare/dacbe03e0d4e...5500e5c00bc1
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            docs/configuration.md
            src/main/java/hudson/remoting/Base64.java
            src/main/java/hudson/remoting/Channel.java
            src/main/java/hudson/remoting/ErrorPropagatingOutputStream.java
            src/main/java/hudson/remoting/ExportTable.java
            src/main/java/hudson/remoting/LocalChannel.java
            src/main/java/hudson/remoting/RemoteInvocationHandler.java
            src/main/java/hudson/remoting/VirtualChannel.java
            http://jenkins-ci.org/commit/remoting/27bb51dd14db44952348539d15ee004c90c248b2
            Log:
            Document and annotate object Export/Unexport logic

            During last releases we have fixed several issues related to the object export/unexport operations (JENKINS-23271, JENKINS-41852). I decided that it makes sense to improve the method documentation just to make Javadocs more explicit.

            The change also adds missing documentation of hudson.remoting.ExportTable.unexportLogSize

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: docs/configuration.md src/main/java/hudson/remoting/Base64.java src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/ErrorPropagatingOutputStream.java src/main/java/hudson/remoting/ExportTable.java src/main/java/hudson/remoting/LocalChannel.java src/main/java/hudson/remoting/RemoteInvocationHandler.java src/main/java/hudson/remoting/VirtualChannel.java http://jenkins-ci.org/commit/remoting/27bb51dd14db44952348539d15ee004c90c248b2 Log: Document and annotate object Export/Unexport logic During last releases we have fixed several issues related to the object export/unexport operations ( JENKINS-23271 , JENKINS-41852 ). I decided that it makes sense to improve the method documentation just to make Javadocs more explicit. The change also adds missing documentation of hudson.remoting.ExportTable.unexportLogSize

              People

              • Assignee:
                oleg_nenashev Oleg Nenashev
                Reporter:
                pbecotte Paul Becotte
              • Votes:
                83 Vote for this issue
                Watchers:
                102 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: