Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5760

Locked/hanged remote channel causing freezed job and lots of blocked threads.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: core
    • Labels:
      None
    • Environment:
      SUSE
      JDK 6 Latest
      ~50 slaves

      Description

      This has happen a few times. Apparently the "Free Disk Space"-monitoring thread can cause a job to freeze so bad that it cannot be aborted in anyway.

      You can do the following without being able to abort the job:

      • Cancel the job through the Web UI
      • Disconnect/reconnect the slave running the job
      • Do a hard reboot on the slave running the job
      • Delete the slave from Hudson.

      Regardless if the slave is deleted, the job will be shown as running in Hudson (and the slave will still be visible in the executors list)

      The only way to unfreeze the job is to restart Hudson.

      More so, the freeze seems to generate a pretty bad thread leak/build up.

      Now, to the details (I'm sorry, but I can not post a full thread stack atm):

      The Free Disk Space-job that caused the lock on the remote channel:

      "Monitoring thread for Free Disk Space started on Wed Feb 24 17:33:12 CET 2010" prio=10 tid=0x3d623400 nid=0x2020 runnable [0x402eb000]
      java.lang.Thread.State: RUNNABLE
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:260)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

      • locked <0x5d1cd630> (a java.io.BufferedOutputStream)
        at java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1784)
        at java.io.ObjectOutputStream.flush(ObjectOutputStream.java:691)
        at hudson.remoting.Channel.send(Channel.java:418)
      • locked <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Request.call(Request.java:149)
      • locked <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.node_monitors.DiskSpaceMonitor$1.getFreeSpace(DiskSpaceMonitor.java:71)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200

      It has now been stuck at the writeBytes()-method for about 20h.

      The stuck job:

      "Handling GET /view/XXXXXXX/job/HANGED_JOB/descriptorByName/CheckStylePublisher/checkPattern : http-8080-1" daemon prio=10 tid=0x0839d000 nid=0x1549 waiting for monitor entry [0x431fd000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.FilePath.exists(FilePath.java:944)
        at hudson.util.FormFieldValidator$WorkspaceFileMask.check(FormFieldValidator.java:377)
        at hudson.util.FormFieldValidator.process(FormFieldValidator.java:140)
        at hudson.plugins.checkstyle.util.PluginDescriptor.doCheckPattern(PluginDescriptor.java:93)
        at sun.reflect.GeneratedMethodAccessor685.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:169)
        at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:101)
        at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:54)
        at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:74)
        at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:30)
        at org.kohsuke.stapler.Stapler.invoke(Stapler.java:519)
        at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:180)
        at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:30)
        at org.kohsuke.stapler.Stapler.invoke(Stapler.java:519)
        at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:180)
        at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:30)
        at org.kohsuke.stapler.Stapler.invoke(Stapler.java:519)
        at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:180)
        at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:30)
        at org.kohsuke.stapler.Stapler.invoke(Stapler.java:519)
        at org.kohsuke.stapler.Stapler.invoke(Stapler.java:435)
        at org.kohsuke.stapler.Stapler.service(Stapler.java:123)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:94)
        at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:51)
        at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:97)
        at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:86)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:47)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
        at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.ui.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:166)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.ui.basicauth.BasicProcessingFilter.doFilter(BasicProcessingFilter.java:173)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
        at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:66)
        at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:76)
        at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:164)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)

      "Executor #0 for lnx0391 : executing HANGED_JOB #197" prio=10 tid=0x465f6000 nid=0x15b3 waiting for monitor entry [0x40ad4000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:148)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.plugins.checkstyle.CheckStylePublisher.perform(CheckStylePublisher.java:99)
        at hudson.plugins.checkstyle.util.HealthAwarePublisher.perform(HealthAwarePublisher.java:136)
        at hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:36)
        at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:582)
        at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:563)
        at hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:550)
        at hudson.model.Build$RunnerImpl.post2(Build.java:152)
        at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:528)
        at hudson.model.Run.run(Run.java:1221)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:122)

      And the rest of the locked threads (building up each time Hudson tries to collect info from the slaves):

      "pool-3-thread-532" daemon prio=10 tid=0x43592000 nid=0x3668 waiting for monitor entry [0x3a0e8000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Channel.close(Channel.java:720)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:450)
        at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:74)
        at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:390)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

      "Monitoring thread for Clock Difference started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08172800 nid=0x35be waiting for monitor entry [0x39ff5000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.model.Slave.getClockDifference(Slave.java:230)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:53)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Architecture started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08bdd000 nid=0x35bd waiting for monitor entry [0x3a046000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:44)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:41)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Temp Space started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08c01000 nid=0x35bc waiting for monitor entry [0x3a097000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:73)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Disk Space started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08bfb000 nid=0x35bb waiting for monitor entry [0x3a1f9000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.node_monitors.DiskSpaceMonitor$1.getFreeSpace(DiskSpaceMonitor.java:71)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Swap Space started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08bda400 nid=0x35ba waiting for monitor entry [0x3a1a8000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
        at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Response Time started on Thu Feb 25 08:33:12 CET 2010" prio=10 tid=0x08de0c00 nid=0x35b8 waiting for monitor entry [0x3a430000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Channel.send(Channel.java:411)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Request.callAsync(Request.java:170)
        at hudson.remoting.Channel.callAsync(Channel.java:578)
        at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:58)
        at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:52)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Clock Difference started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x091fe400 nid=0x33eb waiting for monitor entry [0x3a24a000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.model.Slave.getClockDifference(Slave.java:230)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:53)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Architecture started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x08663c00 nid=0x33ea waiting for monitor entry [0x4695c000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:44)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:41)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Temp Space started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x08de3800 nid=0x33e9 waiting for monitor entry [0x469ad000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:73)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Disk Space started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x083dec00 nid=0x33e8 waiting for monitor entry [0x469fe000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.FilePath.act(FilePath.java:683)
        at hudson.FilePath.act(FilePath.java:676)
        at hudson.node_monitors.DiskSpaceMonitor$1.getFreeSpace(DiskSpaceMonitor.java:71)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
        at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Free Swap Space started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x0815f800 nid=0x33e7 waiting for monitor entry [0x3b8c1000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
        at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Response Time started on Thu Feb 25 07:33:12 CET 2010" prio=10 tid=0x082ba800 nid=0x33e5 waiting for monitor entry [0x3b402000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Channel.send(Channel.java:411)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Request.callAsync(Request.java:170)
        at hudson.remoting.Channel.callAsync(Channel.java:578)
        at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:58)
        at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:52)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Clock Difference started on Thu Feb 25 06:33:12 CET 2010" prio=10 tid=0x0890b800 nid=0x32f8 waiting for monitor entry [0x3a29b000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.model.Slave.getClockDifference(Slave.java:230)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:53)
        at hudson.node_monitors.ClockMonitor$1.monitor(ClockMonitor.java:49)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      "Monitoring thread for Architecture started on Thu Feb 25 06:33:12 CET 2010" prio=10 tid=0x0865a000 nid=0x32f7 waiting for monitor entry [0x3a3df000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.remoting.Request.call(Request.java:100)

      • waiting to lock <0x5c7a8ef0> (a hudson.remoting.Channel)
        at hudson.remoting.Channel.call(Channel.java:551)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:44)
        at hudson.node_monitors.ArchitectureMonitor$DescriptorImpl.monitor(ArchitectureMonitor.java:41)
        at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:200)

      + Another 80 (or so) other locked threads.

        Issue Links

          Activity

          Hide
          lkishalmi lkishalmi added a comment -

          I guess (not sure enough) that JENKINS-5413 is just another aspect of this problem.

          Show
          lkishalmi lkishalmi added a comment - I guess (not sure enough) that JENKINS-5413 is just another aspect of this problem.
          Hide
          lkishalmi lkishalmi added a comment -

          I was watching the other side of the channel and turned my sshd loglevel to DEBUG2. It seems the slave could not write to the channel as it seems to be closed:

          Aug 6 01:46:32 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: rcvd adjust 15117
          Aug 6 01:46:33 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: rcvd adjust 15426
          Aug 6 01:50:43 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: read 25 from efd 12
          Aug 6 01:50:43 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: rwin 20092 elen 25 euse 1
          Aug 6 01:50:43 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: sent ext data 25
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug1: Received SIGCHLD.
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug1: session_by_pid: pid 27838
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug1: session_exit_message: session 0 channel 0 pid 27838
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: request exit-status confirm 0
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug1: session_exit_message: release channel 0
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: write failed
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: close_write
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: send eow
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: output open -> closed
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: read<=0 rfd 10 len 0
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: read failed
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: close_read
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: input open -> drain
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: read 0 from efd 12
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: closing read-efd 12
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: ibuf empty
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: send eof
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: input drain -> closed
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: send close
          Aug 6 01:50:44 EVHUBUDSD0190 sshd[27822]: debug2: notify_done: reading
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: debug2: channel 0: request keepalive@openssh.com confirm 1
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: Connection closed by 10.0.6.105
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: debug1: channel 0: free: server-session, nchannels 1
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: debug1: session_close: session 0 pid 0
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: debug1: do_cleanup
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: Transferred: sent 346344792, received 25263488 bytes
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27822]: Closing connection to 10.0.6.105 port 33050
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27806]: debug1: PAM: cleanup
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27806]: debug1: PAM: closing session
          Aug 6 01:55:44 EVHUBUDSD0190 sshd[27806]: pam_unix(sshd:session): session closed for user hudson

          Show
          lkishalmi lkishalmi added a comment - I was watching the other side of the channel and turned my sshd loglevel to DEBUG2. It seems the slave could not write to the channel as it seems to be closed: Aug 6 01:46:32 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: rcvd adjust 15117 Aug 6 01:46:33 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: rcvd adjust 15426 Aug 6 01:50:43 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: read 25 from efd 12 Aug 6 01:50:43 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: rwin 20092 elen 25 euse 1 Aug 6 01:50:43 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: sent ext data 25 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug1: Received SIGCHLD. Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug1: session_by_pid: pid 27838 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug1: session_exit_message: session 0 channel 0 pid 27838 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: request exit-status confirm 0 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug1: session_exit_message: release channel 0 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: write failed Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: close_write Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: send eow Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: output open -> closed Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: read<=0 rfd 10 len 0 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: read failed Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: close_read Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: input open -> drain Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: read 0 from efd 12 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: closing read-efd 12 Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: ibuf empty Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: send eof Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: input drain -> closed Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: send close Aug 6 01:50:44 EVHUBUDSD0190 sshd [27822] : debug2: notify_done: reading Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : debug2: channel 0: request keepalive@openssh.com confirm 1 Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : Connection closed by 10.0.6.105 Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : debug1: channel 0: free: server-session, nchannels 1 Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : debug1: session_close: session 0 pid 0 Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : debug1: do_cleanup Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : Transferred: sent 346344792, received 25263488 bytes Aug 6 01:55:44 EVHUBUDSD0190 sshd [27822] : Closing connection to 10.0.6.105 port 33050 Aug 6 01:55:44 EVHUBUDSD0190 sshd [27806] : debug1: PAM: cleanup Aug 6 01:55:44 EVHUBUDSD0190 sshd [27806] : debug1: PAM: closing session Aug 6 01:55:44 EVHUBUDSD0190 sshd [27806] : pam_unix(sshd:session): session closed for user hudson
          Hide
          lkishalmi lkishalmi added a comment -

          I found something that could be related to this on jsch mail archives:
          http://www.mail-archive.com/jsch-users@lists.sourceforge.net/msg00829.html

          Also we suspect that this could be some synchronization issue as it mostly happens with our fast multicore slaves.

          Show
          lkishalmi lkishalmi added a comment - I found something that could be related to this on jsch mail archives: http://www.mail-archive.com/jsch-users@lists.sourceforge.net/msg00829.html Also we suspect that this could be some synchronization issue as it mostly happens with our fast multicore slaves.
          Hide
          lkishalmi lkishalmi added a comment -

          This issue seems to be duplicate of: JENKINS-5977
          Please reopen if you experience it again in version 1.380+

          Show
          lkishalmi lkishalmi added a comment - This issue seems to be duplicate of: JENKINS-5977 Please reopen if you experience it again in version 1.380+

            People

            • Assignee:
              kohsuke Kohsuke Kawaguchi
              Reporter:
              glundh glundh
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: