Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31914

Jenkins execution slave crashes when two jobs attempt to checkout the same file from SVN

    Details

    • Similar Issues:

      Description

      Hi,

      I am seeing a reproducible issue where my Jenkins execution slave will crash if two jobs try to checkout the same file from SVN at the same time (to their own, individual workspaces).

      My setup:
      I have an environment with a Jenkins Master and single Jenkins Slave. The Master has 0 executors and the Slave has 50 executors. We use SVN for our SCM and have ~100 jobs.

      Our 100 jobs are divided into products where each product has about 10 jobs that are nearly identical, but go to a different build folder within the checkout to build a different configuration of the product. This means that the same repository (and all of its externals) is checked out 10 times on a commit to build 10 different configures of very similar code.

      Recently we have been busy and see a lot of commits, each one launching 10 nearly identical jobs, which is exactly what we want. The issue is that (as far as I can tell) two of the jobs attempt to checkout the same file at the same time, and then the slave crashes with a message like this:

      FATAL: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.Request.abort(Request.java:297)
      	at hudson.remoting.Channel.terminate(Channel.java:847)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
      	at ......remote call to Umbreon(Native Method)
      	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
      	at hudson.remoting.Request.call(Request.java:172)
      	at hudson.remoting.Channel.call(Channel.java:780)
      	at hudson.FilePath.act(FilePath.java:979)
      	at hudson.FilePath.act(FilePath.java:968)
      	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:848)
      	at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:786)
      	at hudson.model.AbstractProject.checkout(AbstractProject.java:1276)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
      	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
      	at hudson.model.Run.execute(Run.java:1738)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:98)
      	at hudson.model.Executor.run(Executor.java:410)
      Caused by: java.io.IOException: Unexpected termination of the channel
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
      	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
      	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
      	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
      	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
      	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      

      Case1:
      I set up a test environment where I have a master and slave, the slave with 10 executors. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 10 times. The idea is that each job will get its own workspace, checkout the repository and exit, but most of the time (but not always) the slave process is killed with a SIGABRT and then the master restarts the slave. I could see that at least two jobs were trying to checkout the same file at the same time.

      Case2:
      I set up another test environment where I have a master and 10 slaves, each slave with 1 executor. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 10 times. The idea is that each job will get its own executor, own workspace, checkout the repository and exit. This works 100% of the times that I tried it and I saw no failures.

      Case3:
      I set up another test environment where I have a master and 10 slaves, each slave with 10 executors. I create a test job that just checks out a large repository and then exits. Then I rapidly launch the same job 100 times. The idea is that each executor will get 10 jobs, each job will get its own workspace, checkout the repository and exit. This was a good test case because only 5/10 of the slaves failed and restarted the slave.jar, the rest were ok. Of the slaves that failed, I could see that at least two jobs were trying to checkout the same file at the same time.

        Attachments

          Activity

          Hide
          iambaskar Baskaran D added a comment - - edited

          Am also facing the same issue in our environment. But we are having the issue in "Case2".

          We have 15 slaves each having 1 executor. 1 Job, which can run concurrent builds. Many times we are facing this issue, when the multiple builds are started at the same time.

          Jenkins ver. 1.609

          Any quick workaround for this issue?

          Show
          iambaskar Baskaran D added a comment - - edited Am also facing the same issue in our environment. But we are having the issue in "Case2". We have 15 slaves each having 1 executor. 1 Job, which can run concurrent builds. Many times we are facing this issue, when the multiple builds are started at the same time. Jenkins ver. 1.609 Any quick workaround for this issue?
          Hide
          tobeychris Chris Tobey added a comment -

          What is the last thing that a failing job is doing before the slave dies?

          What version of the SVN plugin are you using?

          Show
          tobeychris Chris Tobey added a comment - What is the last thing that a failing job is doing before the slave dies? What version of the SVN plugin are you using?
          Hide
          iambaskar Baskaran D added a comment -

          GIT Checkout was the last that that the slave was doing.

          GIT Plugin V 2.3.5

          00:00:01.659  > git rev-parse --is-inside-work-tree # timeout=10
          00:00:01.688 Fetching changes from the remote Git repository
          00:00:01.691  > git config remote.origin.url git@gitlab.ax.com:ax/nasa.git # timeout=10
          00:00:01.698 Cleaning workspace
          00:00:01.700  > git rev-parse --verify HEAD # timeout=10
          00:00:01.721 Resetting working tree
          00:00:01.722  > git reset --hard # timeout=10
          00:00:02.983 FATAL: java.io.IOException: Unexpected termination of the channel
          00:00:02.983 hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          00:00:02.985  at hudson.remoting.Request.abort(Request.java:296)
          00:00:02.985  at hudson.remoting.Channel.terminate(Channel.java:815)
          00:00:02.991  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
          00:00:02.992  at ......remote call to terraform005(Native Method)
          00:00:02.992  at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1360)
          00:00:02.992  at hudson.remoting.Request.call(Request.java:171)
          00:00:02.993  at hudson.remoting.Channel.call(Channel.java:752)
          00:00:02.993  at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:179)
          00:00:02.993  at com.sun.proxy.$Proxy70.clean(Unknown Source)
          00:00:02.993  at org.jenkinsci.plugins.gitclient.RemoteGitImpl.clean(RemoteGitImpl.java:444)
          00:00:02.994  at hudson.plugins.git.extensions.impl.CleanBeforeCheckout.decorateFetchCommand(CleanBeforeCheckout.java:32)
          00:00:02.994  at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:731)
          00:00:02.995  at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:983)
          00:00:02.995  at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1016)
          00:00:03.000  at org.jenkinsci.plugins.multiplescms.MultiSCM.checkout(MultiSCM.java:122)
          00:00:03.000  at hudson.scm.SCM.checkout(SCM.java:484)
          00:00:03.001  at hudson.model.AbstractProject.checkout(AbstractProject.java:1270)
          00:00:03.001  at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609)
          00:00:03.002  at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
          00:00:03.003  at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:531)
          00:00:03.007  at hudson.model.Run.execute(Run.java:1741)
          00:00:03.008  at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          00:00:03.008  at hudson.model.ResourceController.execute(ResourceController.java:98)
          00:00:03.008  at hudson.model.Executor.run(Executor.java:374)
          00:00:03.009 Caused by: java.io.IOException: Unexpected termination of the channel
          00:00:03.009  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
          00:00:03.010 Caused by: java.io.EOFException
          00:00:03.010  at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
          00:00:03.010  at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
          00:00:03.011  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
          00:00:03.011  at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
          00:00:03.011  at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
          00:00:03.012  at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          00:00:03.012  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          
          Show
          iambaskar Baskaran D added a comment - GIT Checkout was the last that that the slave was doing. GIT Plugin V 2.3.5 00:00:01.659 > git rev-parse --is-inside-work-tree # timeout=10 00:00:01.688 Fetching changes from the remote Git repository 00:00:01.691 > git config remote.origin.url git@gitlab.ax.com:ax/nasa.git # timeout=10 00:00:01.698 Cleaning workspace 00:00:01.700 > git rev-parse --verify HEAD # timeout=10 00:00:01.721 Resetting working tree 00:00:01.722 > git reset --hard # timeout=10 00:00:02.983 FATAL: java.io.IOException: Unexpected termination of the channel 00:00:02.983 hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel 00:00:02.985 at hudson.remoting.Request.abort(Request.java:296) 00:00:02.985 at hudson.remoting.Channel.terminate(Channel.java:815) 00:00:02.991 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) 00:00:02.992 at ......remote call to terraform005(Native Method) 00:00:02.992 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1360) 00:00:02.992 at hudson.remoting.Request.call(Request.java:171) 00:00:02.993 at hudson.remoting.Channel.call(Channel.java:752) 00:00:02.993 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:179) 00:00:02.993 at com.sun.proxy.$Proxy70.clean(Unknown Source) 00:00:02.993 at org.jenkinsci.plugins.gitclient.RemoteGitImpl.clean(RemoteGitImpl.java:444) 00:00:02.994 at hudson.plugins.git.extensions.impl.CleanBeforeCheckout.decorateFetchCommand(CleanBeforeCheckout.java:32) 00:00:02.994 at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:731) 00:00:02.995 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:983) 00:00:02.995 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1016) 00:00:03.000 at org.jenkinsci.plugins.multiplescms.MultiSCM.checkout(MultiSCM.java:122) 00:00:03.000 at hudson.scm.SCM.checkout(SCM.java:484) 00:00:03.001 at hudson.model.AbstractProject.checkout(AbstractProject.java:1270) 00:00:03.001 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609) 00:00:03.002 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) 00:00:03.003 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:531) 00:00:03.007 at hudson.model.Run.execute(Run.java:1741) 00:00:03.008 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 00:00:03.008 at hudson.model.ResourceController.execute(ResourceController.java:98) 00:00:03.008 at hudson.model.Executor.run(Executor.java:374) 00:00:03.009 Caused by: java.io.IOException: Unexpected termination of the channel 00:00:03.009 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) 00:00:03.010 Caused by: java.io.EOFException 00:00:03.010 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332) 00:00:03.010 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801) 00:00:03.011 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) 00:00:03.011 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) 00:00:03.011 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40) 00:00:03.012 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) 00:00:03.012 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          Hide
          danielbeck Daniel Beck added a comment -

          Slave side logs would be interesting as well.

          What version of the Subversion plugin is this? Did your tests also point all the builds at the same working copy in some way, or were they completely independent? What workspace version is this?

          Show
          danielbeck Daniel Beck added a comment - Slave side logs would be interesting as well. What version of the Subversion plugin is this? Did your tests also point all the builds at the same working copy in some way, or were they completely independent? What workspace version is this?
          Hide
          tobeychris Chris Tobey added a comment - - edited

          Hi Daniel,

          Subversion plugin v1.54 and I am not sure how to tell the workspace version.

          All of the builds would get their own workspace: ie job, job@1, job@2, ..., job@10, etc.

          On the slave side there isn't much info, just that they were killed with SIGABRT.

          Show
          tobeychris Chris Tobey added a comment - - edited Hi Daniel, Subversion plugin v1.54 and I am not sure how to tell the workspace version. All of the builds would get their own workspace: ie job, job@1, job@2, ..., job@10, etc. On the slave side there isn't much info, just that they were killed with SIGABRT.
          Hide
          recena Manuel Recena Soto added a comment -

          Chris Tobey I recommend you to upgrade Subversion Plugin.

          Show
          recena Manuel Recena Soto added a comment - Chris Tobey I recommend you to upgrade Subversion Plugin.
          Hide
          danielbeck Daniel Beck added a comment -

          For backward compatibility reasons, Jenkins still ships with Subversion plugin 1.x, but it's about two years outdated. Update it.

          Show
          danielbeck Daniel Beck added a comment - For backward compatibility reasons, Jenkins still ships with Subversion plugin 1.x, but it's about two years outdated. Update it.
          Hide
          danielbeck Daniel Beck added a comment -

          Why was this issue even assigned to KK to begin with?

          /me shakes head, walks away…

          Show
          danielbeck Daniel Beck added a comment - Why was this issue even assigned to KK to begin with? /me shakes head, walks away…
          Hide
          recena Manuel Recena Soto added a comment -

          Daniel Beck I believe he is the project lead of this component in JIRA.

          Show
          recena Manuel Recena Soto added a comment - Daniel Beck I believe he is the project lead of this component in JIRA.
          Hide
          danielbeck Daniel Beck added a comment -

          Right, filed against SSH slaves. We probably need to find a different maintainer for that one…

          Show
          danielbeck Daniel Beck added a comment - Right, filed against SSH slaves. We probably need to find a different maintainer for that one…
          Hide
          recena Manuel Recena Soto added a comment -

          Daniel Beck I'm not sure if the problem is related to Subversion Plugin.

          Show
          recena Manuel Recena Soto added a comment - Daniel Beck I'm not sure if the problem is related to Subversion Plugin.
          Hide
          danielbeck Daniel Beck added a comment -

          Maybe Oleg Nenashev can tell us whether this looks more like a remoting issue?

          Show
          danielbeck Daniel Beck added a comment - Maybe Oleg Nenashev can tell us whether this looks more like a remoting issue?
          Hide
          tobeychris Chris Tobey added a comment -

          Updating the SVN plug from r1.54 to r2.5.7 seems to fix the issue.

          I set up a test environment with a fresh install and could not reproduce (everything at the latest version).
          Then I upgraded my main environment (and had to deal with the credential changes) and it seems ok now too. Ran 20 checkouts simultaneously and no failures.

          Show
          tobeychris Chris Tobey added a comment - Updating the SVN plug from r1.54 to r2.5.7 seems to fix the issue. I set up a test environment with a fresh install and could not reproduce (everything at the latest version). Then I upgraded my main environment (and had to deal with the credential changes) and it seems ok now too. Ran 20 checkouts simultaneously and no failures.
          Hide
          danielbeck Daniel Beck added a comment -

          Only affects obsolete versions.

          Show
          danielbeck Daniel Beck added a comment - Only affects obsolete versions.
          Hide
          recena Manuel Recena Soto added a comment -

          Chris Tobey Thanks for your feedback.

          Show
          recena Manuel Recena Soto added a comment - Chris Tobey Thanks for your feedback.

            People

            • Assignee:
              recena Manuel Recena Soto
              Reporter:
              tobeychris Chris Tobey
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: