Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41932

libzfs native API changed for OpenZFS - causes crashes on newer illumos distributions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None

      Context: Jenkins.WAR ships with a libzfs-0.5.jar which gets enabled and loaded on operating systems detected as Solaris or similar, which currently is a broad family including the proprietary Oracle Solaris and various open-source distributions that appeared from the legacy of OpenSolaris and are currently based on the illumos core and its intimate relationship to the OpenZFS spin-off project, sharing code with *BSD, ZoL and perhaps (inofficial) MacOS support of ZFS.

      Problem: The libzfs.jar provides JNA wrapping to internal (not public, not committed) API/ABI of the native libzfs.so which is the best (only) binding there is, de-facto. Unfortunately, such non-committed APIs changed (after the split of Oracle and FOSS codebases); the change happened in OpenZFS codebase about 5 years ago and was picked back up by illumos-gate about 9 months ago - so the issue began manifesting in rolling-releases of OSes that deploy a new illumos core as commits land, so since about June 2016. When the existing JAR tries to call a native function with a wrong signature, the JVM segfaults. Currently libzfs.jar assumes that ZFS is only present on Solaris-like OSes, so the issue did not manifest in other platforms that support it in fact - but those also do not take advantage of ZFS either (this is a separate issue though).

      As discussed with ci_jenkinsci_org at FOSDEM, there are in fact several codebases for libzfs.jar itself too - one on his github account https://github.com/kohsuke/libzfs4j (which in fact has the fix for newer ZFS API in https://github.com/kohsuke/libzfs4j/commit/05067e754e56e7249e320d86cea769c3b878aeeb), and another at java.net (https://java.net/projects/zfs - with older API, and so it seems to be the one shipped in Jenkins).

      It seems feasible and safe to detect presence of new API by querying the libzfs.so for presence of routines that support ZFS feature flags (something that Oracle ZFS will likely never have) and so make assumptions OpenZFS vs OracleZFS, and further assuming a recent OpenZFS - set a Java boolea flag and use the new function signature following a Java if-clause.

      As a fallback for cases where automagic guesses wrongly, and/or as the initial implementation, the same toggle can be done by an envvar set in the application server initscript/service. At least, this fix will hold for the lifetime of the server, across updates of Jenkins.war (currently the custom build of libzfs.jar has to be substituted as part of upgrade procedure).

      Another option to explore is to perhaps integrate instead with libzfs_core https://github.com/openzfs/openzfs/tree/master/usr/src/lib/libzfs_core https://www.illumos.org/issues/2882 which is AFAIK the attempt at a stable and public API/ABI to tools using and managing ZFS in OpenZFS and illumos, and maybe at libzfs_jni https://github.com/openzfs/openzfs/tree/master/usr/src/lib/libzfs_jni - but the latter seems stale, and AFAIK there is a desire to evict it and nuke the build-time dependency of OpenZFS and illumos-gate on Java.

            kohsuke Kohsuke Kawaguchi
            jimklimov Jim Klimov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: