Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13330

Jenkins slave hangs in post build phase

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • RHEL 5, both master and all slaves.
      Jenkins is running inside of Tomcat

      We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step what we see is that the console log has this line:

      Description set: vap_current_iter-2012_03_29_19_01_03

      And then nothing. Usually, it will look like this:

      Description set: prod_pull-2012_03_28_19_01_03
      Notifying upstream build armada_Launch_prod_pull #13 of job completion
      Project armada_Launch_prod_pull still waiting for 1 builds to complete
      Notifying upstream projects of job completion
      Notifying upstream of completion: armada_Launch_prod_pull #13
      Finished: SUCCESS

      I setup a logger for hudson.model.Run, and it currently has this :

          at java.lang.Thread.run(Thread.java:619)
      
      Mar 30, 2012 12:44:00 PM hudson.model.Run run
      INFO: galleon_allUnit #1134 main build action completed: SUCCESS
      Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
      FINE: galleon_allUnit #1134 : result is set to SUCCESS
      java.lang.Exception
          at hudson.model.Run.setResult(Run.java:352)
          at hudson.model.Run.run(Run.java:1410)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:238)
      

      Repeated for every hung slave.

      The main hudson log doesn't have any additional information.

      Disconnecting the slave has no effect.

      Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown).

      The only way we have found to recover is to kill -9 the tomcat process.

      The tread dump for one of the slaves (they are all the same) is:

      Thread Dump
      Channel reader thread: channel
      
      "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
          at java.io.FileInputStream.readBytes(Native Method)
          at java.io.FileInputStream.read(FileInputStream.java:199)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
          -  locked java.io.BufferedInputStream@1ae615a
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
      
      
      main
      
      "main" Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
          at java.lang.Object.wait(Native Method)
          -  waiting on hudson.remoting.Channel@e1d5ea
          at java.lang.Object.wait(Object.java:485)
          at hudson.remoting.Channel.join(Channel.java:766)
          at hudson.remoting.Launcher.main(Launcher.java:420)
          at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
          at hudson.remoting.Launcher.run(Launcher.java:206)
          at hudson.remoting.Launcher.main(Launcher.java:168)
      
      
      Ping thread for channel hudson.remoting.Channel@e1d5ea:channel
      
      "Ping thread for channel hudson.remoting.Channel@e1d5ea:channel" Id=10 Group=main TIMED_WAITING
          at java.lang.Thread.sleep(Native Method)
          at hudson.remoting.PingThread.run(PingThread.java:86)
      
      
      Pipe writer thread: channel
      
      "Pipe writer thread: channel" Id=12 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
          at sun.misc.Unsafe.park(Native Method)
          -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
          at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
          at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
          at java.lang.Thread.run(Thread.java:619)
      
      
      pool-1-thread-267
      
      "pool-1-thread-267" Id=285 Group=main RUNNABLE
          at sun.management.ThreadImpl.dumpThreads0(Native Method)
          at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
          at hudson.Functions.getThreadInfos(Functions.java:872)
          at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93)
          at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89)
          at hudson.remoting.UserRequest.perform(UserRequest.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:48)
          at hudson.remoting.Request$2.run(Request.java:287)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)
      
          Number of locked synchronizers = 1
          - java.util.concurrent.locks.ReentrantLock$NonfairSync@1186f88
      
      
      Finalizer
      
      "Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@1798fdd
          at java.lang.Object.wait(Native Method)
          -  waiting on java.lang.ref.ReferenceQueue$Lock@1798fdd
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
          at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
          at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
      
      
      Reference Handler
      
      "Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@1d40442
          at java.lang.Object.wait(Native Method)
          -  waiting on java.lang.ref.Reference$Lock@1d40442
          at java.lang.Object.wait(Object.java:485)
          at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
      
      
      Signal Dispatcher
      
      "Signal Dispatcher" Id=4 Group=system RUNNABLE
      

      Any ideas on how to better recover or prevent this would be greatly appreciated.

            Unassigned Unassigned
            cwright Clark Wright
            Votes:
            7 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: