Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49406

Design (JEP) the Evergreen snapshotting data safety system

    Details

    • Similar Issues:
    • Sprint:
      Evergreen - Milestone 1

      Description

      I need to explore the idea suggested by Sam Gleske of using a Git repository on-disk for checking in .xml.files before we run an upgrade process.

      In addition the approach should rollback if there is a failure, such as Jenkins failing to come up properly.

        Attachments

          Issue Links

            Activity

            Hide
            sag47 Sam Gleske added a comment - - edited

            Example from my Jenkins RPM package:

            That repository supports packaging Jenkins and plugins into multiple formats.

            ./gradlew buildRpm
            ./gradlew buildDeb
            ./gradlew buildTar
            #or package all three with ./gradlew packages
            
            #docker requires buildTar
            docker build -t jenkins .
            

            Additional notes

            • One of the challenges I discussed with R. Tyler Croy was setting workspaces for jobs building on master outside of JENKINS_HOME. Otherwise, you encounter weird issues with Git repositories inside of other Git repositories when they're not submodules. In general, we know it's bad practice for people to build on the master but it still gets done.
            • The gitignore file I linked intentionally does not track secret.key or the secrets directory. The intention here is that secrets get backed up separately from the encrypted configuration. However, this may not matter to some organizations.
            • Eventually, I want to completely rewrite the service scripts I copied from jenkins-packaging.  Mainly because I have a different style of bash writing and will propose my changes back.
            Show
            sag47 Sam Gleske added a comment - - edited Example from my Jenkins RPM package: preUninstall.sh script running dailycommit.sh to save a copy of configuration before package upgrade . Example gitignore used for my JENKINS_HOME . Contents of dailycommit.sh . That repository supports packaging Jenkins and plugins into multiple formats. ./gradlew buildRpm ./gradlew buildDeb ./gradlew buildTar #or package all three with ./gradlew packages #docker requires buildTar docker build -t jenkins . Additional notes One of the challenges I discussed with R. Tyler Croy was setting workspaces for jobs building on master outside of JENKINS_HOME. Otherwise, you encounter weird issues with Git repositories inside of other Git repositories when they're not submodules. In general, we know it's bad practice for people to build on the master but it still gets done. The gitignore file I linked intentionally does not track secret.key or the secrets directory. The intention here is that secrets get backed up separately from the encrypted configuration. However, this may not matter to some organizations. Eventually, I want to completely rewrite the service scripts I copied from jenkins-packaging.  Mainly because I have a different style of bash writing and will propose my changes back.
            Hide
            rtyler R. Tyler Croy added a comment -

            I'm going to assign this to Baptiste Mathus. Feel free to spin up some separate tickets as necessary to explore additional avenues of experimentation.

            I would expect that the end-result of the prototype/experiment phase would be a JEP document.

            Show
            rtyler R. Tyler Croy added a comment - I'm going to assign this to Baptiste Mathus . Feel free to spin up some separate tickets as necessary to explore additional avenues of experimentation. I would expect that the end-result of the prototype/experiment phase would be a JEP document.
            Hide
            jglick Jesse Glick added a comment -

            For inspiration: etckeeper

            Show
            jglick Jesse Glick added a comment - For inspiration:  etckeeper
            Hide
            jglick Jesse Glick added a comment -

            Also think carefully about compatibleSinceVersion.

            Show
            jglick Jesse Glick added a comment - Also think carefully about compatibleSinceVersion .
            Hide
            batmat Baptiste Mathus added a comment -

            Also think carefully about compatibleSinceVersion.

            Jesse Glick I didn't plan anything specific to be honest using this metadata. Because yes, we probably could do some optimizations on this front, for instance not reverting to previous if compatibleSinceVersion stayed the same. But as you said too, IIUC, yesterday well this practice is not currently used often and carefully enough to be really usable automatically I suspect?

            But agreed this might be something we can improve over time while defining the efforts and things a given plugin has to comply with to be able to enter the set of plugins delivered/used in Essentials. WDYT?

            (Should we rather take this in a dedicated thread on the ML BTW? I plan one anyway, so maybe we'll get back to it there very soon.)

            Show
            batmat Baptiste Mathus added a comment - Also think carefully about compatibleSinceVersion. Jesse Glick I didn't plan anything specific to be honest using this metadata. Because yes, we probably could do some optimizations on this front, for instance not reverting to previous if compatibleSinceVersion stayed the same. But as you said too, IIUC, yesterday well this practice is not currently used often and carefully enough to be really usable automatically I suspect? But agreed this might be something we can improve over time while defining the efforts and things a given plugin has to comply with to be able to enter the set of plugins delivered/used in Essentials . WDYT? (Should we rather take this in a dedicated thread on the ML BTW? I plan one anyway, so maybe we'll get back to it there very soon.)
            Hide
            batmat Baptiste Mathus added a comment -
            Show
            batmat Baptiste Mathus added a comment - As discussed yesterday, first draft submitted for review to the dev list: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/jenkinsci-dev/XdXuMFLXKPw (=> https://github.com/batmat/jep/pull/1 )
            Hide
            batmat Baptiste Mathus added a comment -

            We had a chat today with Raul Arabaolaza and he vented a quite important thing we might want to do IMO: to reduce the risk of creating more things than necessary when Jenkins starts again after an upgrade, puting it in quiet start mode could help.

            Only once the evergreen client has performed the upgrade, and checked Jenkins is judged healthy, would it automatically cancel its quiet mode.

            Show
            batmat Baptiste Mathus added a comment - We had a chat today with Raul Arabaolaza and he vented a quite important thing we might want to do IMO: to reduce the risk of creating more things than necessary when Jenkins starts again after an upgrade, puting it in quiet start mode could help. Only once the evergreen client has performed the upgrade, and checked Jenkins is judged healthy, would it automatically cancel its quiet mode.
            Hide
            rtyler R. Tyler Croy added a comment -

            That's an interesting idea Baptiste Mathus!

            I wonder if starting in quiet mode would result in us missing any potential errors? If not, then I say let's do it!

            Show
            rtyler R. Tyler Croy added a comment - That's an interesting idea Baptiste Mathus ! I wonder if starting in quiet mode would result in us missing any potential errors? If not, then I say let's do it!
            Hide
            batmat Baptiste Mathus added a comment - - edited

            I wonder if starting in quiet mode would result in us missing any potential errors? If not, then I say let's do it!

            Definitely. And Jesse Glick already had a similar comment reviewing https://github.com/batmat/jep/pull/1
            But I think it would still be interesting to triage the potential issue causes, with a slightly more progressive process.

            Roughly, would/could be:

            • set to start in quiet mode next time, and restart
            • check Jenkins is healthy [1]
            • if yes, cancel quiet [EDIT: or better, write some plugin that would *only* allow our smoke testing job, on the next bullet point, to run]
            • start some kind of smoke testing build
            • if success, then \o/, if not, roll back.

            [1] R. Tyler Croy about that, I have been starting to think since a few days we probably need a dedicated JIRA/JEP to design what "evergreen-client decides if Jenkins is healthy [enough] or not", aka to trigger a rollback, or not... Do we something like this? WDYT?

            Show
            batmat Baptiste Mathus added a comment - - edited I wonder if starting in quiet mode would result in us missing any potential errors? If not, then I say let's do it! Definitely. And Jesse Glick already had a similar comment reviewing https://github.com/batmat/jep/pull/1 But I think it would still be interesting to triage the potential issue causes, with a slightly more progressive process. Roughly, would/could be: set to start in quiet mode next time, and restart check Jenkins is healthy [1] if yes, cancel quiet [EDIT: or better, write some plugin that would *only* allow our smoke testing job, on the next bullet point, to run] start some kind of smoke testing build if success, then \o/, if not, roll back. [1]   R. Tyler Croy about that, I have been starting to think since a few days we probably need a dedicated JIRA/JEP to design what "evergreen-client decides if Jenkins is healthy [enough] or not", aka to trigger a rollback, or not... Do we something like this? WDYT?
            Hide
            rtyler R. Tyler Croy added a comment -

            Baptiste Mathus, regarding a JEP for determining Jenkins healthiness for Jenkins Essentials, I think that's a good idea and will be a useful design document to discuss with the broader development community.

            Will you file a ticket for that and drop it into Milestone 1?

            Show
            rtyler R. Tyler Croy added a comment - Baptiste Mathus , regarding a JEP for determining Jenkins healthiness for Jenkins Essentials, I think that's a good idea and will be a useful design document to discuss with the broader development community. Will you file a ticket for that and drop it into Milestone 1?
            Hide
            rarabaolaza Raul Arabaolaza added a comment -

            I fully agree also.

            Just for openness and even if this has been already add to other sources this are the meeting notes of my conversation with Baptiste Mathus yesterday:

            RAUL: This is intended for development time, not for deployment validation
            Idea is Try an upgrade, test all works properly perform a rollback and test again all is working

            BAPTISTE: We are likely to be able to reuse the “health check” logic that will have to be developed for evergreen-client itself in production, to check if Jenkins is running fine.
            RAUL: critical: we need to test the health check

            QUESTION: Should we try to implement synthetic transactions here or go with ATH which already exists?

            PROPOSALS for Rollback testing:

            • Make sure there is enough coverage that all possible rollback paths are covered
            • Create a quality bar for rollbacks
              • Make sure you are including some failing scenarios in the quality bar
              • Not only test the happy path, for example:
                • Made a failed upgrade, test that we are able to detect the upgrade as a failure, rollback and test that the instance is working perfectly
                • Made a failed upgrade, test that we are able to detect the upgrade as a failure, made a failed rollback and test that we are able to detect the rollback failed
              • Make sure that in case of different chained rollback strategies we test each and every one of them
            • Create a healthcheck url to be invoked via CURL for example
              • We can create a plugin that provides that healthcheck url and integrate with ST
              • Maybe some work from metrics plugin can be reused

            Some possible testing flows:

            • Upgrade run health check (ST), rollback, ST again ¿and ATH?
              • No work yet on ST that I am aware of, but ST can be later reused for deployment testing
            • Run ATH, rollback, ATH again
              • Some work already done, but ATH is maybe too heavy and coverage is pretty poor and based on individual plugins not in coherent sets of them
                This should be done in the “pre canary, staging, or whatever is named” instances because we want to catch any possible degradation or problems in long running instances
            Show
            rarabaolaza Raul Arabaolaza added a comment - I fully agree also. Just for openness and even if this has been already add to other sources this are the meeting notes of my conversation with Baptiste Mathus yesterday: RAUL: This is intended for development time, not for deployment validation Idea is Try an upgrade, test all works properly perform a rollback and test again all is working BAPTISTE: We are likely to be able to reuse the “health check” logic that will have to be developed for evergreen-client itself in production, to check if Jenkins is running fine. RAUL: critical: we need to test the health check QUESTION: Should we try to implement synthetic transactions here or go with ATH which already exists? PROPOSALS for Rollback testing: Make sure there is enough coverage that all possible rollback paths are covered Create a quality bar for rollbacks Make sure you are including some failing scenarios in the quality bar Not only test the happy path, for example: Made a failed upgrade, test that we are able to detect the upgrade as a failure, rollback and test that the instance is working perfectly Made a failed upgrade, test that we are able to detect the upgrade as a failure, made a failed rollback and test that we are able to detect the rollback failed Make sure that in case of different chained rollback strategies we test each and every one of them Create a healthcheck url to be invoked via CURL for example We can create a plugin that provides that healthcheck url and integrate with ST Maybe some work from metrics plugin can be reused Some possible testing flows: Upgrade run health check (ST), rollback, ST again ¿and ATH? No work yet on ST that I am aware of, but ST can be later reused for deployment testing Run ATH, rollback, ATH again Some work already done, but ATH is maybe too heavy and coverage is pretty poor and based on individual plugins not in coherent sets of them This should be done in the “pre canary, staging, or whatever is named” instances because we want to catch any possible degradation or problems in long running instances
            Hide
            batmat Baptiste Mathus added a comment -

            FTR, meeting added in the repo as we'll do for all of them in the future: https://github.com/jenkins-infra/evergreen/tree/master/docs/meetings/2018-03-18-JENKINS-49406-quality-bar

            Show
            batmat Baptiste Mathus added a comment - FTR, meeting added in the repo as we'll do for all of them in the future:  https://github.com/jenkins-infra/evergreen/tree/master/docs/meetings/2018-03-18-JENKINS-49406-quality-bar
            Hide
            batmat Baptiste Mathus added a comment -

             Tried to start implementing the root separation by changing the "builds" and "workspace" directories as described in https://github.com/batmat/jep/blob/a3d70917b1095ee27c292c029593f79913ff186a/jep/302/README.adoc#segregate-job-configuration-and-build-data using CasC to also test/prototype this part of the proposal, but this proved impossible. See https://github.com/jenkinsci/configuration-as-code-plugin/issues/151

            Show
            batmat Baptiste Mathus added a comment -  Tried to start implementing the root separation by changing the "builds" and "workspace" directories as described in https://github.com/batmat/jep/blob/a3d70917b1095ee27c292c029593f79913ff186a/jep/302/README.adoc#segregate-job-configuration-and-build-data  using CasC to also test/prototype this part of the proposal, but this proved impossible. See https://github.com/jenkinsci/configuration-as-code-plugin/issues/151
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Baptiste Mathus
            Path:
            jep/0000/README.adoc
            http://jenkins-ci.org/commit/jep/6773edbc06488de4c2fa7371f54c79df38672861
            Log:
            JENKINS-49406 Evergreen snapshotting data safety system JEP

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Baptiste Mathus Path: jep/0000/README.adoc http://jenkins-ci.org/commit/jep/6773edbc06488de4c2fa7371f54c79df38672861 Log: JENKINS-49406 Evergreen snapshotting data safety system JEP
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: R. Tyler Croy
            Path:
            jep/302/README.adoc
            jep/README.adoc
            http://jenkins-ci.org/commit/jep/949cbdb6bb2823a0a780e1005cf86a9b815f48b6
            Log:
            Merge pull request #67 from batmat/JENKINS-49406-JEP-submission

            JENKINS-49406 Evergreen snapshotting data safety system JEP

            Compare: https://github.com/jenkinsci/jep/compare/b5b57a9f1c93...949cbdb6bb28

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: R. Tyler Croy Path: jep/302/README.adoc jep/README.adoc http://jenkins-ci.org/commit/jep/949cbdb6bb2823a0a780e1005cf86a9b815f48b6 Log: Merge pull request #67 from batmat/ JENKINS-49406 -JEP-submission JENKINS-49406 Evergreen snapshotting data safety system JEP Compare: https://github.com/jenkinsci/jep/compare/b5b57a9f1c93...949cbdb6bb28
            Hide
            batmat Baptiste Mathus added a comment -

            See JENKINS-50958 for usage of this specification

            Show
            batmat Baptiste Mathus added a comment - See JENKINS-50958 for usage of this specification

              People

              • Assignee:
                batmat Baptiste Mathus
                Reporter:
                rtyler R. Tyler Croy
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: