Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11962

Symlinking lastSuccessful build shouldn't fail with concurrent jobs

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
    • Environment:
      Jenkins 1.438
      Concurrent job
      3 builds running
    • Similar Issues:

      Description

      I had three builds running at the same time, and two of them finished during the same second. This lead to the following message on one of them:

      ln -s builds/2011-12-01_21-35-22 /var/lib/jenkins/jobs/my_job/builds/../lastSuccessful failed: 17 File exists

      The job still succeeded, so it's not a big deal.

      This seems like a race condition between rm-ing the old symlink and creating the new one. Maybe ln -sf would work better? I assume it does its operations atomically.

        Attachments

        1. bugchugger_cleanup_1098516.txt
          0.6 kB
        2. bugchugger_cleanup_1098517.txt
          0.5 kB
        3. bugchugger_cleanup_1098518.txt
          0.5 kB
        4. error
          15 kB
        5. ScreenShot.png
          ScreenShot.png
          213 kB

          Activity

          Hide
          rdkchrom Radek Chromy added a comment -

          Found the same problem on Jenkins version: 1.496
          4 Jobs running in parallel, the first one has finished successfully (with console "ln -s ... failed: 17 File exists"), but the others stuck forever.

          Show
          rdkchrom Radek Chromy added a comment - Found the same problem on Jenkins version: 1.496 4 Jobs running in parallel, the first one has finished successfully (with console "ln -s ... failed: 17 File exists"), but the others stuck forever.
          Hide
          rdrabens Rebecca Drabenstott added a comment -

          I am also seeing this issue (version 1.451). I have up to 7 jobs running concurrently and I'm getting the "ln -s ... failed: 17 File exists" error message in about 5% of the jobs. That means that there is almost always a group of jobs stuck because of a job that is unable to successfully create the symlink. The stuck jobs eventually finish, sometimes in a few seconds, sometimes up to 20 minutes later.

          Show
          rdrabens Rebecca Drabenstott added a comment - I am also seeing this issue (version 1.451). I have up to 7 jobs running concurrently and I'm getting the "ln -s ... failed: 17 File exists" error message in about 5% of the jobs. That means that there is almost always a group of jobs stuck because of a job that is unable to successfully create the symlink. The stuck jobs eventually finish, sometimes in a few seconds, sometimes up to 20 minutes later.
          Hide
          danielbeck Daniel Beck added a comment -

          Can this be reproduced in more recent versions of Jenkins (no older than 8-10 weeks or so)? If so, what OS and what version and vendor of Java are you using? Please include log excerpts, content of the /systemInfo URL in your comment. Also relevant:
          https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue

          Show
          danielbeck Daniel Beck added a comment - Can this be reproduced in more recent versions of Jenkins (no older than 8-10 weeks or so)? If so, what OS and what version and vendor of Java are you using? Please include log excerpts, content of the /systemInfo URL in your comment. Also relevant: https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue
          Hide
          rdrabens Rebecca Drabenstott added a comment -

          I have reproduced this issue in Jenkins 1.588 running on Red Hat Enterprise Linux Server release 5.5 (Tikanga). It is using Java 1.6.0_21-b06 from Sun (Oracle). I’m not sure my company would be happy if I publicly posted the entire contents of /systemInfo, but if there are specific sections of interest, I could possibly remove the sensitive data and post them.

          It seems that one job gets stuck trying (and ultimately failing) to create a symlink. While it is stuck, other jobs get stuck too. The instant the first job finishes, the other jobs finish too. The job in this example runs very quickly and also gets stuck for a fairly short period of time, but we have seen the same behavior in other jobs that normally run on the order of many seconds to many minutes and they can get stuck for up to 20 minutes.

          In the screen shot that I have attached, you can see that jobs 1098516, 1098517, 1098518 took longer than the others. Job 1098516 failed to create the symlink. I’m attaching the console output of the three jobs. I’m also attaching part of the Jenkins error log. There is an error in the error log, but it is difficult to tell if it is related or not. It is a frequent error and in other examples of this issue, the error does not always occur immediately before the stuck jobs finish.

          Show
          rdrabens Rebecca Drabenstott added a comment - I have reproduced this issue in Jenkins 1.588 running on Red Hat Enterprise Linux Server release 5.5 (Tikanga). It is using Java 1.6.0_21-b06 from Sun (Oracle). I’m not sure my company would be happy if I publicly posted the entire contents of /systemInfo, but if there are specific sections of interest, I could possibly remove the sensitive data and post them. It seems that one job gets stuck trying (and ultimately failing) to create a symlink. While it is stuck, other jobs get stuck too. The instant the first job finishes, the other jobs finish too. The job in this example runs very quickly and also gets stuck for a fairly short period of time, but we have seen the same behavior in other jobs that normally run on the order of many seconds to many minutes and they can get stuck for up to 20 minutes. In the screen shot that I have attached, you can see that jobs 1098516, 1098517, 1098518 took longer than the others. Job 1098516 failed to create the symlink. I’m attaching the console output of the three jobs. I’m also attaching part of the Jenkins error log. There is an error in the error log, but it is difficult to tell if it is related or not. It is a frequent error and in other examples of this issue, the error does not always occur immediately before the stuck jobs finish.
          Hide
          danielbeck Daniel Beck added a comment -

          I think the Java 1.6 implementation needs JNA to work or something similar. Maybe try running Jenkins on Java 7.

          Also, it doesn't look like you need the builds to run in parallel, as they are all done within milliseconds. Disabling that would likely prevent issues like that.

          Show
          danielbeck Daniel Beck added a comment - I think the Java 1.6 implementation needs JNA to work or something similar. Maybe try running Jenkins on Java 7. Also, it doesn't look like you need the builds to run in parallel, as they are all done within milliseconds. Disabling that would likely prevent issues like that.
          Hide
          rdrabens Rebecca Drabenstott added a comment -

          Thanks for the suggestions Daniel. We plan to move to Java 7 in the near future. Possibly that will fix the issue. Good point about the parallel runs being unnecessary. However, the other job we have that runs for much longer (and stalls for much longer) does require parallel runs.

          Show
          rdrabens Rebecca Drabenstott added a comment - Thanks for the suggestions Daniel. We plan to move to Java 7 in the near future. Possibly that will fix the issue. Good point about the parallel runs being unnecessary. However, the other job we have that runs for much longer (and stalls for much longer) does require parallel runs.
          Hide
          kenpoole Ken Poole added a comment -

          This just happened to us on jenkins 1.644 running from the "official" docker image.

          Show
          kenpoole Ken Poole added a comment - This just happened to us on jenkins 1.644 running from the "official" docker image.

            People

            • Assignee:
              Unassigned
              Reporter:
              jorgenpt Jørgen Tjernø
            • Votes:
              4 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: