Details

    • Similar Issues:

      Description

      The implementation of AtomicFileWriter is not actually atomic.

      There are several issues with it.

      1. the rename first deletes the target giving a window of opportunity for no file to exist.
      2. the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
        This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

      Jenkins should make use of java.nio to make sure

      1. data is actually synced
      2. it uses atomic operations for move on platforms where it is available.

        Attachments

          Issue Links

            Activity

            teilo James Nord created issue -
            rtyler R. Tyler Croy made changes -
            Field Original Value New Value
            Workflow JNJira [ 171052 ] JNJira + In-Review [ 184143 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-36088 [ JENKINS-36088 ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jglick Jesse Glick made changes -
            Assignee Steven Christou [ schristou ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 2548 (Web Link)" [ 14872 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            teilo James Nord made changes -
            Link This issue relates to JENKINS-47324 [ JENKINS-47324 ]
            teilo James Nord made changes -
            Status In Review [ 10005 ] In Progress [ 3 ]
            teilo James Nord made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            teilo James Nord made changes -
            Assignee Steven Christou [ schristou ] James Nord [ teilo ]
            teilo James Nord made changes -
            Assignee James Nord [ teilo ]
            teilo James Nord made changes -
            Labels data
            teilo James Nord made changes -
            Labels data
            teilo James Nord made changes -
            Labels dataloss
            batmat Baptiste Mathus made changes -
            Assignee Baptiste Mathus [ batmat ]
            batmat Baptiste Mathus made changes -
            Description The implementation of AtomicFile is not actually atomic.

            There are several issues with it.

            1) the rename first deletes the target giving a window of opertunity for no file to exist.

            2) the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.

            This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files where zero length.

            Jenkins should make use of java.nio to make sure
            1) data is actually synced
            2) it uses atomic operations for move on platforms where it is available.
            The implementation of {{AtomicFile}} is not actually atomic.

            There are several issues with it.
             # the rename first deletes the target giving a window of opportunity for no file to exist.
             # the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
            This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

            Jenkins should make use of {{java.nio}} to make sure
             # data is actually synced
             # it uses atomic operations for move on platforms where it is available.
            batmat Baptiste Mathus made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            batmat Baptiste Mathus made changes -
            Description The implementation of {{AtomicFile}} is not actually atomic.

            There are several issues with it.
             # the rename first deletes the target giving a window of opportunity for no file to exist.
             # the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
            This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

            Jenkins should make use of {{java.nio}} to make sure
             # data is actually synced
             # it uses atomic operations for move on platforms where it is available.
            The implementation of {{AtomicFileWriter}} is not actually atomic.

            There are several issues with it.
             # the rename first deletes the target giving a window of opportunity for no file to exist.
             # the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
             This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

            Jenkins should make use of {{java.nio}} to make sure
             # data is actually synced
             # it uses atomic operations for move on platforms where it is available.
            batmat Baptiste Mathus made changes -
            Summary AtomicFile isn't Atomic AtomicFileWrite isn't Atomic
            batmat Baptiste Mathus made changes -
            Summary AtomicFileWrite isn't Atomic AtomicFileWriter isn't Atomic
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal OSS-2561 (Web Link)" [ 18249 ]
            batmat Baptiste Mathus made changes -
            Remote Link Cette demande est liée à "PR-3170 (Lien Web)" [ 18304 ]
            batmat Baptiste Mathus made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            oleg_nenashev Oleg Nenashev made changes -
            Status In Review [ 10005 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            oleg_nenashev Oleg Nenashev made changes -
            Labels dataloss dataloss lts-candidate
            batmat Baptiste Mathus made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            batmat Baptiste Mathus made changes -
            Status Reopened [ 4 ] Open [ 1 ]
            batmat Baptiste Mathus made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            batmat Baptiste Mathus made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            batmat Baptiste Mathus made changes -
            Comment [ [https://github.com/jenkinsci/jenkins/pull/3170] now has a successful build and is ready for review. So, yes, ideally it would be great once it's reviewed and merged if we can have it in the .2, or maybe .3 for whatever reason.

            With .2, with the christmas release break (.2 ETA January 17th), it would make it 4+ weeks of soak testing. So iff we don't see bad feedback or alerts, I think it would be worth it to get it backported indeed. It's a pretty nasty issue, when it shows up.  ]
            dnusbaum Devin Nusbaum made changes -
            Link This issue relates to JENKINS-48407 [ JENKINS-48407 ]
            batmat Baptiste Mathus made changes -
            Status In Review [ 10005 ] In Progress [ 3 ]
            batmat Baptiste Mathus made changes -
            Status In Progress [ 3 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            olivergondza Oliver Gondža made changes -
            Labels dataloss lts-candidate 2.89.4-rejected dataloss lts-candidate
            danielbeck Daniel Beck made changes -
            Labels 2.89.4-rejected dataloss lts-candidate 2.89.4-rejected dataloss
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal TIGER-2833 (Web Link)" [ 20538 ]

              People

              • Assignee:
                batmat Baptiste Mathus
                Reporter:
                teilo James Nord
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: