Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60167

AtomicFileWriter performance issue on CephFS in case of Empty File creation

XMLWordPrintable

    • Jenkins 2.206

      Hello, during migration from NFS to CephFS file storage we faced with performance degradation of Server startup due to RunIdMigrator.

       

      After trace analysis we figure out following thing:

      AtomicFileWriter create FileChannelWriterwith with only one OpenOption - StandardOpenOption.WRITE.

      For a newly created File, in case when AtomicFileWriter used to create new Empty file (ex: jenkins.model.RunIdMigrator#save) it leads to full fs sync instead of fsync on dirty inodes

      As a result this operation took up to 5 sec on CephFS.

      As a fix we add StandardOpenOption.CREATE OpenOption. MR - https://github.com/jenkinsci/jenkins/pull/4357

       

      Ceph logs Before Fix:

       [Wed Nov 13 16:17:26 2019] ceph: alloc_inode 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync 000000000aeb2b5f
       [Wed Nov 13 16:17:26 2019] ceph: fsync dirty caps are -
       [Wed Nov 13 16:17:30 2019] ceph: fsync 000000000aeb2b5f result=0

       

      Ceph logs After Fix:

       [Wed Nov 13 16:05:43 2019] ceph: alloc_inode 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !dirty
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671
       [Wed Nov 13 16:05:43 2019] ceph: fsync dirty caps are Fw
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now !flushing
       [Wed Nov 13 16:05:43 2019] ceph: inode 000000001442a671 now clean
       [Wed Nov 13 16:05:43 2019] ceph: fsync 000000001442a671 result=0

      Server startup with 2k job required to be migrated:

      • before fix startup took ~30min
      • after startup 2 min

            bulanovk Konstantin Bulanov
            bulanovk Konstantin Bulanov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: