Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31914

Jenkins execution slave crashes when two jobs attempt to checkout the same file from SVN

    Details

    • Similar Issues:

      Description

      Hi,

      I am seeing a reproducible issue where my Jenkins execution slave will crash if two jobs try to checkout the same file from SVN at the same time (to their own, individual workspaces).

      My setup:
      I have an environment with a Jenkins Master and single Jenkins Slave. The Master has 0 executors and the Slave has 50 executors. We use SVN for our SCM and have ~100 jobs.

      Our 100 jobs are divided into products where each product has about 10 jobs that are nearly identical, but go to a different build folder within the checkout to build a different configuration of the product. This means that the same repository (and all of its externals) is checked out 10 times on a commit to build 10 different configures of very similar code.

      Recently we have been busy and see a lot of commits, each one launching 10 nearly identical jobs, which is exactly what we want. The issue is that (as far as I can tell) two of the jobs attempt to checkout the same file at the same time, and then the slave crashes with a message like this:

      FATAL: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.Request.abort(Request.java:297)
      	at hudson.remoting.Channel.terminate(Channel.java:847)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      	at ......remote call to Umbreon(Native Method)
      	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
      	at hudson.remoting.Request.call(Request.java:172)
      	at hudson.remoting.Channel.call(Channel.java:780)
      	at hudson.FilePath.act(FilePath.java:979)
      	at hudson.FilePath.act(FilePath.java:968)
      	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:848)
      	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:786)
      	at hudson.model.AbstractProject.checkout(AbstractProject.java:1276)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
      	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
      	at hudson.model.Run.execute(Run.java:1738)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:98)
      	at hudson.model.Executor.run(Executor.java:410)
      Caused by: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
      	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
      	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
      	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
      	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
      	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      

      Case1:
      I set up a test environment where I have a master and slave, the slave with 10 executors. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 10 times. The idea is that each job will get its own workspace, checkout the repository and exit, but most of the time (but not always) the slave process is killed with a SIGABRT and then the master restarts the slave. I could see that at least two jobs were trying to checkout the same file at the same time.

      Case2:
      I set up another test environment where I have a master and 10 slaves, each slave with 1 executor. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 10 times. The idea is that each job will get its own executor, own workspace, checkout the repository and exit. This works 100% of the times that I tried it and I saw no failures.

      Case3:
      I set up another test environment where I have a master and 10 slaves, each slave with 10 executors. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 100 times. The idea is that each executor will get 10 jobs, each job will get its own workspace, checkout the repository and exit. This was a good test case because only 5/10 of the slaves failed and restarted the slave.jar, the rest were ok. Of the slaves that failed, I could see that at least two jobs were trying to checkout the same file at the same time.

        Attachments

          Activity

          Hide
          recena Manuel Recena Soto added a comment -

          Daniel Beck I'm not sure if the problem is related to Subversion Plugin.

          Show
          recena Manuel Recena Soto added a comment - Daniel Beck I'm not sure if the problem is related to Subversion Plugin.
          Hide
          danielbeck Daniel Beck added a comment -

          Maybe Oleg Nenashev can tell us whether this looks more like a remoting issue?

          Show
          danielbeck Daniel Beck added a comment - Maybe Oleg Nenashev can tell us whether this looks more like a remoting issue?
          Hide
          tobeychris Chris Tobey added a comment -

          Updating the SVN plug from r1.54 to r2.5.7 seems to fix the issue.

          I set up a test environment with a fresh install and could not reproduce (everything at the latest version).
          Then I upgraded my main environment (and had to deal with the credential changes) and it seems ok now too. Ran 20 checkouts simultaneously and no failures.

          Show
          tobeychris Chris Tobey added a comment - Updating the SVN plug from r1.54 to r2.5.7 seems to fix the issue. I set up a test environment with a fresh install and could not reproduce (everything at the latest version). Then I upgraded my main environment (and had to deal with the credential changes) and it seems ok now too. Ran 20 checkouts simultaneously and no failures.
          Hide
          danielbeck Daniel Beck added a comment -

          Only affects obsolete versions.

          Show
          danielbeck Daniel Beck added a comment - Only affects obsolete versions.
          Hide
          recena Manuel Recena Soto added a comment -

          Chris Tobey Thanks for your feedback.

          Show
          recena Manuel Recena Soto added a comment - Chris Tobey Thanks for your feedback.

            People

            • Assignee:
              recena Manuel Recena Soto
              Reporter:
              tobeychris Chris Tobey
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: