Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40578

Matrix flyweight job crashes with NPE if it's triggered jobs are in the queue for a long time

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • matrix-project-plugin
    • None
    • Jenkins 2.36
      Matrix Project Plugin 1.7.1

      I have a Matrix project with 50 configurations that run against a label supported by 25 machines. So, when this job runs, it immediately creates a build queue. This job has "execute concurrent builds if necessary" enabled, although I believe I have seen this issue occur when only one instance of this job running as well.

      The build queue on this server can sometimes grow very large, preventing these jobs from running for a long time. After some time with its matrix configuration jobs in the queue, I see the flyweight job fail with the following null pointer (causing Jenkins to interrupt all of the configurations in the job to fail):

      Interrupting #1003
      FATAL: null
      java.lang.NullPointerException
      at hudson.matrix.DefaultMatrixExecutionStrategyImpl.waitForCompletion(DefaultMatrixExecutionStrategyImpl.java:288)
      at hudson.matrix.DefaultMatrixExecutionStrategyImpl.run(DefaultMatrixExecutionStrategyImpl.java:162)
      at hudson.matrix.MatrixBuild$MatrixBuildExecution.doRun(MatrixBuild.java:364)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
      at hudson.model.Run.execute(Run.java:1729)
      at hudson.matrix.MatrixBuild.run(MatrixBuild.java:313)
      at hudson.model.ResourceController.execute(ResourceController.java:98)
      at hudson.model.Executor.run(Executor.java:404)
      Finished: FAILURE

      looking at the source in https://github.com/jenkinsci/matrix-project-plugin/blob/master/src/main/java/hudson/matrix/DefaultMatrixExecutionStrategyImpl.java it does look like its checking to see if the queue item is null at line 288, is it possible that this is some race condition where the job has been assigned to a build machine after the code has checked if the queue item is not null but before the print statement has executed?

      All I see in the main log of Jenkins is the server logging that it aborts all of the associated matrix jobs that the flyweight job created.

            kohsuke Kohsuke Kawaguchi
            gabrielbash Gabriel Ash
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: