Details

    • Similar Issues:

      Description

      The implementation of AtomicFileWriter is not actually atomic.

      There are several issues with it.

      1. the rename first deletes the target giving a window of opportunity for no file to exist.
      2. the rename of the file will rename the file but the data in the file may not have been flushed to disk. In XFS this is a biggy as the metadata and filedata are written separatly - so despite the file being closed there is not fsync call - so if the system crashes you end up with zero length files.
        This has been observed in the wild when the OS hosting jenkins crashed, on restart several of Jenkins files were zero length.

      Jenkins should make use of java.nio to make sure

      1. data is actually synced
      2. it uses atomic operations for move on platforms where it is available.

        Attachments

          Issue Links

            Activity

            Hide
            michaelneale Michael Neale added a comment -

            yes this would be very good. Good catch James Nord - I believe this has been the underlying problem of a lot of things (I might have to take back the bad things I said about xfs, as this seems more likely to be the problem of a block consistent snapshot without xfs_freeze).

            Very nice!

            Show
            michaelneale Michael Neale added a comment - yes this would be very good. Good catch James Nord - I believe this has been the underlying problem of a lot of things (I might have to take back the bad things I said about xfs, as this seems more likely to be the problem of a block consistent snapshot without xfs_freeze). Very nice!
            Hide
            jglick Jesse Glick added a comment -

            I have also noticed that Jenkins.getInstance().createProject(Whatever.class, "aux") will fail on Windows in AtomicFileWriter.<init> with Failed to create a temporary file in …\jobs\aux, which is not very meaningful, because an earlier File.mkdirs returned false but this was ignored. If you switch to Files.createDirectories you at least get a java.nio.file.FileSystemException: …\jobs\aux: The directory name is invalid. which is also a bit cryptic but at least points to the issue: the aux part.

            Show
            jglick Jesse Glick added a comment - I have also noticed that Jenkins.getInstance().createProject(Whatever.class, "aux") will fail on Windows in AtomicFileWriter.<init> with Failed to create a temporary file in …\jobs\aux , which is not very meaningful, because an earlier File.mkdirs returned false but this was ignored. If you switch to Files.createDirectories you at least get a java.nio.file.FileSystemException: …\jobs\aux: The directory name is invalid. which is also a bit cryptic but at least points to the issue: the aux part.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: christ66
            Path:
            core/src/main/java/hudson/util/AtomicFileWriter.java
            http://jenkins-ci.org/commit/jenkins/0a7b656af6ac5c70a408abb082f95ecd51ee8bd0
            Log:
            JENKINS-34855 Make atomic file write more atomic by using JDK 7 apis.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: christ66 Path: core/src/main/java/hudson/util/AtomicFileWriter.java http://jenkins-ci.org/commit/jenkins/0a7b656af6ac5c70a408abb082f95ecd51ee8bd0 Log: JENKINS-34855 Make atomic file write more atomic by using JDK 7 apis.
            Show
            batmat Baptiste Mathus added a comment - https://github.com/jenkinsci/jenkins/pull/3170  in review
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            The fix has been integrated towards 2.93.

            Baptiste Mathus Steven Christou please assign the label if you want the fix to be backported to 2.89.2/3. I am not 100% sure it is safe to backport it without lots of soak testing

            Show
            oleg_nenashev Oleg Nenashev added a comment - The fix has been integrated towards 2.93. Baptiste Mathus Steven Christou please assign the label if you want the fix to be backported to 2.89.2/3. I am not 100% sure it is safe to backport it without lots of soak testing
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Actually, added the label. But the discussion is required

            Show
            oleg_nenashev Oleg Nenashev added a comment - Actually, added the label. But the discussion is required
            Hide
            batmat Baptiste Mathus added a comment -

            Oleg Nenashev the issue is only partly fixed.  And probably only improved should we say for now. We agreed to merge the first shot so that we could split the risk and indeed let it soak separately to possibly better understand and narrow down potential issues later on.

            https://github.com/jenkinsci/jenkins/pull/3170 contains the final fix.

            That is why I had kept this JIRA open for now. 

            Show
            batmat Baptiste Mathus added a comment - Oleg Nenashev the issue is only partly fixed.  And probably only improved should we say for now. We agreed to merge the first shot so that we could split the risk and indeed let it soak separately to possibly better understand and narrow down potential issues later on. https://github.com/jenkinsci/jenkins/pull/3170  contains the final fix. That is why I had kept this JIRA open for now. 
            Hide
            batmat Baptiste Mathus added a comment -

            https://github.com/jenkinsci/jenkins/pull/3170 now has a successful build and is ready for review. So, yes, ideally it would be great once it's reviewed and merged if we can have it in the .2, or maybe .3 for whatever reason.

            With .2, with the christmas release break (.2 ETA January 17th), it would make it 4+ weeks of soak testing. So iff we don't see bad feedback or alerts, I think it would be worth it to get it backported indeed. It's a pretty nasty issue, when it shows up. 

            Show
            batmat Baptiste Mathus added a comment - https://github.com/jenkinsci/jenkins/pull/3170  now has a successful build and is ready for review. So, yes, ideally it would be great once it's reviewed and merged if we can have it in the .2, or maybe .3 for whatever reason. With .2, with the christmas release break (.2 ETA January 17th), it would make it 4+ weeks of soak testing. So iff we don't see bad feedback or alerts, I think it would be worth it to get it backported indeed. It's a pretty nasty issue, when it shows up. 
            Hide
            batmat Baptiste Mathus added a comment -

            FYI an opt-out flag has been added to the ongoing PR: https://github.com/jenkinsci/jenkins/pull/3170 as per Sam Van Oort good points. Reviews welcome.

            Show
            batmat Baptiste Mathus added a comment - FYI an opt-out flag has been added to the ongoing PR: https://github.com/jenkinsci/jenkins/pull/3170  as per Sam Van Oort good points. Reviews welcome.
            Hide
            batmat Baptiste Mathus added a comment -

            Once merged & released, this should probably be picked up in next LTS only once/if JENKINS-48407 is fixed too.

            Show
            batmat Baptiste Mathus added a comment - Once merged & released, this should probably be picked up in next LTS only once/if  JENKINS-48407 is fixed too.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Baptiste Mathus
            Path:
            core/src/main/java/hudson/util/FileChannelWriter.java
            core/src/test/java/hudson/util/FileChannelWriterTest.java
            http://jenkins-ci.org/commit/jenkins/22efbad9450fc95748e0bc3bee04ec49702d7ed2
            Log:
            JENKINS-34855 Create a FileChannelWriter

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Baptiste Mathus Path: core/src/main/java/hudson/util/FileChannelWriter.java core/src/test/java/hudson/util/FileChannelWriterTest.java http://jenkins-ci.org/commit/jenkins/22efbad9450fc95748e0bc3bee04ec49702d7ed2 Log: JENKINS-34855 Create a FileChannelWriter
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Baptiste Mathus
            Path:
            core/src/main/java/hudson/util/AtomicFileWriter.java
            core/src/main/java/hudson/util/FileChannelWriter.java
            core/src/test/java/hudson/util/FileChannelWriterTest.java
            test/src/test/java/hudson/util/AtomicFileWriterPerfTest.java
            http://jenkins-ci.org/commit/jenkins/438e9296f0646d920e19e05c7696d8cec5307a88
            Log:
            Merge pull request #3170 from batmat/JENKINS-34855

            [FIX JENKINS-34855] Create a FileChannelWriter and use it for AtomicFileWriter "enforced" flushing

            Compare: https://github.com/jenkinsci/jenkins/compare/02d0531add35...438e9296f064

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Baptiste Mathus Path: core/src/main/java/hudson/util/AtomicFileWriter.java core/src/main/java/hudson/util/FileChannelWriter.java core/src/test/java/hudson/util/FileChannelWriterTest.java test/src/test/java/hudson/util/AtomicFileWriterPerfTest.java http://jenkins-ci.org/commit/jenkins/438e9296f0646d920e19e05c7696d8cec5307a88 Log: Merge pull request #3170 from batmat/ JENKINS-34855 [FIX JENKINS-34855] Create a FileChannelWriter and use it for AtomicFileWriter "enforced" flushing Compare: https://github.com/jenkinsci/jenkins/compare/02d0531add35...438e9296f064
            Hide
            batmat Baptiste Mathus added a comment -

            Fixed towards 2.102, expected to be released in the next ~12 hours.

            Show
            batmat Baptiste Mathus added a comment - Fixed towards 2.102, expected to be released in the next ~12 hours.
            Hide
            batmat Baptiste Mathus added a comment -

            Reminder: we must pick up JENKINS-48407 with this one if accepted for next LTS.

            Show
            batmat Baptiste Mathus added a comment - Reminder: we must pick up  JENKINS-48407 with this one if accepted for next LTS.

              People

              • Assignee:
                batmat Baptiste Mathus
                Reporter:
                teilo James Nord
              • Votes:
                1 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: