Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-29902

Unexpected executor death - java.lang.IllegalStateException: <build> already existed

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: core
    • Labels:
      None
    • Environment:
      Jenkins 1.609.1
      Master on RHEL 6.6, in in Tomcat 8.
    • Similar Issues:

      Description

      Issue:

      • For a few slaves, the threads go dead.
      • These dead threads seem to be caused by builds on specific jobs.
      • In these cases, the build numbers picked up for subsequent runs are incorrect / already used, and so they executor goes dead.

      Couple of stack trace excerpts from Jenkins log, (there are others):
      Aug 11, 2015 12:18:15 PM SEVERE hudson.model.Executor run
      Unexpected executor death
      java.lang.IllegalStateException: /mnt/jenkins/jenkins/data/jobs/MY_JOB2/builds/653 already existed; will not overwite with MY_JOB2 #653
      at hudson.model.RunMap.put(RunMap.java:187)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1006)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor$1.call(Executor.java:328)
      at hudson.model.Executor$1.call(Executor.java:310)
      at hudson.model.Queue._withLock(Queue.java:1246)
      at hudson.model.Queue.withLock(Queue.java:1184)
      at hudson.model.Executor.run(Executor.java:310)

      Aug 11, 2015 12:59:00 PM SEVERE hudson.model.Executor run
      Unexpected executor death
      java.lang.IllegalStateException: /mnt/jenkins/jenkins/data/jobs/MY_JOB/builds/629 already existed; will not overwite with MY_JOB #629
      at hudson.model.RunMap.put(RunMap.java:187)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1006)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor$1.call(Executor.java:328)
      at hudson.model.Executor$1.call(Executor.java:310)
      at hudson.model.Queue._withLock(Queue.java:1246)
      at hudson.model.Queue.withLock(Queue.java:1184)
      at hudson.model.Executor.run(Executor.java:310)

      • Job build record odd behaviour:
        Attached are two screen shots explaining the behaviour, 'odd_build_records_1.609.1.png' and 'odd_build_records_1.609.1_builds_dir'.
        The last build run is 633, the last build record shown on the Job page is 626.
      • Work around used so far:
        Restart the dead threads as many number of times the build records have piled up on the backend, that aren't showing on the Job page / gui.
        A Jenkins restart brings the build records back on the gui in the expected way.

      One case:
      Build 633 exists.
      New build number picked up by Jenkins is 633 again.
      Triggering a new build causes a thread on the slave to go 'Dead'.
      Corresponding message is:
      java.lang.IllegalStateException: /mnt/jenkins/jenkins/data/jobs/MY_JOB/builds/633 already existed; will not overwite with MY_JOB #633
      at hudson.model.RunMap.put(RunMap.java:187)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1006)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor$1.call(Executor.java:328)
      at hudson.model.Executor$1.call(Executor.java:310)
      at hudson.model.Queue._withLock(Queue.java:1246)
      at hudson.model.Queue.withLock(Queue.java:1184)
      at hudson.model.Executor.run(Executor.java:310)

      Fix: Manually 'Set Next Build Number' to the next one, and then trigger the build. (see attached image, 'odd_build_records_restart_thread.png')
      Also restart the dead executors. (see attached image, odd_build_records_set_build_number.png)

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Reload Configuration from Disk is a plausible trigger for the bug.

            Show
            jglick Jesse Glick added a comment - Reload Configuration from Disk is a plausible trigger for the bug.
            Hide
            sbreitbach Steffen Breitbach added a comment - - edited

            This could be true. We have a job that reloads the configuration (scripted) but also triggers a downstream job. The downstream job ended in a dead executor the last time it ran....

            Would using a quiet period help? Are the build numbers determined/created before or after the quiet period?

            Show
            sbreitbach Steffen Breitbach added a comment - - edited This could be true. We have a job that reloads the configuration (scripted) but also triggers a downstream job. The downstream job ended in a dead executor the last time it ran.... Would using a quiet period help? Are the build numbers determined/created before or after the quiet period?
            Hide
            jglick Jesse Glick added a comment -

            Are the build numbers determined/created before or after the quiet period?

            After.

            We have a job that reloads the configuration (scripted)

            Probably a bad idea, but at any rate if you know how to reproduce from scratch please provide details so the bug can be fixed.

            Show
            jglick Jesse Glick added a comment - Are the build numbers determined/created before or after the quiet period? After. We have a job that reloads the configuration (scripted) Probably a bad idea, but at any rate if you know how to reproduce from scratch please provide details so the bug can be fixed.
            Hide
            tsniatowski Tomasz Śniatowski added a comment -

            I managed to reproduce this on a test instance running Jenkins ver. Jenkins ver. 1.631:

            • Create a freestyle project that takes a while to complete (execute shell: sleep 60)
            • Schedule a build
            • While it is building, schedule another so one is building (1) and one is queued (2)
            • While the first job is still building, trigger 'reload configuration from disk'
            • Wait for (1) to complete: OK
            • Wait for (2) to complete: seems off: build completes, but is not visible in build list of the job
            • Trigger another build: executor dies with an "2 already existed" error

            Reproduced 2/2 times for me and seems to match the logs I have from the "real" instance of the bug I hit.

            Show
            tsniatowski Tomasz Śniatowski added a comment - I managed to reproduce this on a test instance running Jenkins ver. Jenkins ver. 1.631: Create a freestyle project that takes a while to complete (execute shell: sleep 60) Schedule a build While it is building, schedule another so one is building (1) and one is queued (2) While the first job is still building, trigger 'reload configuration from disk' Wait for (1) to complete: OK Wait for (2) to complete: seems off: build completes, but is not visible in build list of the job Trigger another build: executor dies with an "2 already existed" error Reproduced 2/2 times for me and seems to match the logs I have from the "real" instance of the bug I hit.
            Hide
            jglick Jesse Glick added a comment -

            Tomasz Śniatowski yes these steps form the basis of my upcoming functional test.

            Show
            jglick Jesse Glick added a comment - Tomasz Śniatowski yes these steps form the basis of my upcoming functional test.

              People

              • Assignee:
                Unassigned
                Reporter:
                lata lata kopalle
              • Votes:
                5 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: