Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50599

Nodes#addNode does not fail atomically

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: core
    • Labels:
    • Environment:
      More likely in Jenkins 2.107 and newer, but possible since at least Jenkins 2.0
    • Similar Issues:

      Description

      If an exception is thrown by persistNode in Nodes#addNode, then the node will still exist in memory, but it may not exist on disk (or the disk form may be corrupted). This could happen for various reasons, but in particular JEP-200 issues when serializing the node will throw an exception from persistNode and cause this problem.

      The issue was originally noticed by Jesse Glick while investigating JENKINS-50480. The result of the error in that case was that a test logged a severe error because of a failure to add the node in Nodes#addNode, but the test actually passed because the node could be accessed in memory. It would have been better for that test to fail so that the JEP-200 error did not go unnoticed.

      I think it makes sense to remove the node from memory if the call to persistNode throws an exception, so that the Nodes#addNode call does not fail in a half-finished state.

        Attachments

          Issue Links

            Activity

            dnusbaum Devin Nusbaum created issue -
            dnusbaum Devin Nusbaum made changes -
            Field Original Value New Value
            Environment Most likely in Jenkins 2.107 and newer, but possible since at least Jenkins 2.0 More likely in Jenkins 2.107 and newer, but possible since at least Jenkins 2.0
            dnusbaum Devin Nusbaum made changes -
            Description If an exception is thrown by {{persistNode}} in [Nodes#addNode|https://github.com/jenkinsci/jenkins/blob/54325b155401eba369f42857f812f20c3656d5bc/core/src/main/java/jenkins/model/Nodes.java#L129], then the node will still exist in memory, but it may not exist on disk (or the disk form may be corrupted). This could happen for various reasons, but in particular JEP-200 issues when serializing the node will throw an exception from {{persistNode}} and cause this problem. The issue was originally noticed by [~jglick] while investigating JENKINS-50480.

            I think it makes sense to remove the node from memory if the call to persistNode throws an exception, so that the {{Nodes#addNode}} call does not fail in a half-finished state.
            If an exception is thrown by {{persistNode}} in [Nodes#addNode|https://github.com/jenkinsci/jenkins/blob/54325b155401eba369f42857f812f20c3656d5bc/core/src/main/java/jenkins/model/Nodes.java#L129], then the node will still exist in memory, but it may not exist on disk (or the disk form may be corrupted). This could happen for various reasons, but in particular JEP-200 issues when serializing the node will throw an exception from {{persistNode}} and cause this problem. The issue was originally noticed by [~jglick] while investigating JENKINS-50480. The result of the error in that case was that a test logged a severe error because of a failure to add the node in {{Nodes#addNode}}, but the test actually passed because the node could be accessed in memory.

            I think it makes sense to remove the node from memory if the call to persistNode throws an exception, so that the {{Nodes#addNode}} call does not fail in a half-finished state.
            dnusbaum Devin Nusbaum made changes -
            Description If an exception is thrown by {{persistNode}} in [Nodes#addNode|https://github.com/jenkinsci/jenkins/blob/54325b155401eba369f42857f812f20c3656d5bc/core/src/main/java/jenkins/model/Nodes.java#L129], then the node will still exist in memory, but it may not exist on disk (or the disk form may be corrupted). This could happen for various reasons, but in particular JEP-200 issues when serializing the node will throw an exception from {{persistNode}} and cause this problem. The issue was originally noticed by [~jglick] while investigating JENKINS-50480. The result of the error in that case was that a test logged a severe error because of a failure to add the node in {{Nodes#addNode}}, but the test actually passed because the node could be accessed in memory.

            I think it makes sense to remove the node from memory if the call to persistNode throws an exception, so that the {{Nodes#addNode}} call does not fail in a half-finished state.
            If an exception is thrown by {{persistNode}} in [Nodes#addNode|https://github.com/jenkinsci/jenkins/blob/54325b155401eba369f42857f812f20c3656d5bc/core/src/main/java/jenkins/model/Nodes.java#L129], then the node will still exist in memory, but it may not exist on disk (or the disk form may be corrupted). This could happen for various reasons, but in particular JEP-200 issues when serializing the node will throw an exception from {{persistNode}} and cause this problem.

            The issue was originally noticed by [~jglick] while investigating JENKINS-50480. The result of the error in that case was that a test logged a severe error because of a failure to add the node in {{Nodes#addNode}}, but the test actually passed because the node could be accessed in memory. It would have been better for that test to fail so that the JEP-200 error did not go unnoticed.

            I think it makes sense to remove the node from memory if the call to persistNode throws an exception, so that the {{Nodes#addNode}} call does not fail in a half-finished state.
            dnusbaum Devin Nusbaum made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            dnusbaum Devin Nusbaum made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            dnusbaum Devin Nusbaum made changes -
            Remote Link This issue links to "jenkinsci/jenkins#3383 (Web Link)" [ 20389 ]
            oleg_nenashev Oleg Nenashev made changes -
            Labels JEP-200
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue relates to JENKINS-50480 [ JENKINS-50480 ]
            danielbeck Daniel Beck made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal OSS-2687 (Web Link)" [ 20541 ]

              People

              • Assignee:
                Unassigned
                Reporter:
                dnusbaum Devin Nusbaum
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: