Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53028

Deadlock in Pipeline jobs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • workflow-cps-plugin
    • None
    • Jenkins (major versions)
      Pipeline (major versions)

      1. One thread called CpsFlowExecution.onProgramEnd, in which it will lock current object (CpsFlowExecution.this) since it's a synchronized block:
      synchronized /*ACQUIRED THE SYNC ROOT*/ void onProgramEnd(Outcome outcome) {
      
      ...
         this.getStorage().flush();  
      ...

      in this function, it will call a method: this.getStorage().flush(), in which it will go to acquire the write lock of readWriteLock in TimingFlowNodeStorage object:

      @Override
      public void flush() throws IOException {
          try (Timing t = time(TimingKind.flowNode)) {
              readWriteLock.writeLock().lock(); //WAITING THE READ LOCK HERE
              try {
              ...
          }
      }

      here, the thread will park to wait other threads who own the read locks of the unique readWriteLock for releasing all read locks then to acquire the write lock.

          2. On the other thread, the read locks which above thread waits are owned by another thread, which is doing "copyLogs". But the problem is the thread running WorkflowRun.copyLogs is waiting for above thread who waits for the write lock and owns the synchronized block of CpsFlowExecution.this object to release the synchronized block of CpsFlowExecution.this object.

      So, the deadlock happens here.

      @Override
      public FlowNode getNode(String string) throws IOException {
          try (Timing t = time(TimingKind.flowNode)) {
              readWriteLock.readLock().lock();  //ACQUIRED THE READ LOCK
              try {
                  return delegate.getNode(string); 
              } finally {
                  readWriteLock.readLock().unlock();
             }
         }
      }
      
      @Override public void close() { 
      synchronized /*WAITING THE SYNC ROOT*/ (CpsFlowExecution.this) { 
      ...
      
      

      and after checking the master branch of source code, it seems the bug still exists.

      the jstack dump attached.

       

      the suggestion here is:

      in the close() method of Timing class it locked the CpsFlowExecution.tshi class and it can be considered to lock another sync root to avoid this issue.

       

      class Timing implements AutoCloseable {
              private final TimingKind kind;
              private final long start;
              private Timing(TimingKind kind) {
                  this.kind = kind;
                  start = System.nanoTime();
              }
              @Override public void close() {
      
      
      /* HERE, CONSIDER TO LOCK ANOTHER OJBECT TO AVOID THIS ISSUE */
      
                  synchronized (CpsFlowExecution.this /*BUGGY*/) {
                      if (timings == null) {
                          timings = new HashMap<>();
                      }
                      Long orig = timings.get(kind.name());
                      if (orig == null) {
                          orig = 0L;
                      }
                      timings.put(kind.name(), orig + System.nanoTime() - start);
                  }
              }
          }
      

      suggested fix:

       

      class Timing implements AutoCloseable {
              private final TimingKind kind;
              private final long start;
      
              private final object lock
      
              private Timing(TimingKind kind) {
                  this.kind = kind;
                  start = System.nanoTime();
      
                  lock = new Object();
              }
              @Override public void close() {
                  synchronized (lock) {
                      if (timings == null) {
                          timings = new HashMap<>();
                      }
                      Long orig = timings.get(kind.name());
                      if (orig == null) {
                          orig = 0L;
                      }
                      timings.put(kind.name(), orig + System.nanoTime() - start);
                  }
              }
          }
      

            Unassigned Unassigned
            yangye Ye Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: