Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-1205

Infra Documentation and Contributor Experience Update - 2019Q4

    Details

    • Epic Name:
      Infra Contributor UX Revamp 2019Q4
    • Similar Issues:

      Description

       

       

       

      Original list from R. Tyler Croy which needs to be reviewed and implemented

      Incidents/Alerts which need to be documented "the right way to handle":    

      • Jenkins
      • not responding to requests/high CPU
      • Inspecting for slow requests, restarting Jenkins properly
      • Upgrading plugins/restarting to pick up new core changes
      • ideally also how the jenkinsci docker org image creation works in trusted.ci as precondition for core security updates?
      • trusted-ci
      • Agents have stuck pipeliens and don't appear to do anything – docker daemon stuck, needs manual reboot
      • Disk space issues:
      • LDAP - prune old transaction logs
      • eggplant - truncate old Apache logs
      • celery, or other Jenkins agents
      • ci.jenkins.io - the master has /var/lib/jenkins filling up
      • also needs to be made into a proper alert (perhaps metrics plugin based?) rather than admin monitor on the UI
      • Confluence
      • Dealing with spammers:
      • Delete the user
      • Delete the pages
      • Delete the cached pages
      • undo the edits, etc
      • Letsencrypt certificates expire 'soon' can be fixed by /etc/init.d/apache2 reload
      • Mapping AWS instances from Datadog to actual hostnames that are usable
      • Release/distribution architecture documentation
      • Defining all the moving components related to the release and distribution of core and plugins
      • How to perform a manual sync of mirrors
      • Manual syncing for plugin specific updates
      • Blacklist some mirrors
      • Puppet
      • Where is the Puppet dashboard?
      • Figuring out when a Puppet agent is not responding properly (from Datadog)
      • Running puppet manually 
      • Manually running an r10k deployment in the occasion the webhooks from GitHub to puppet.jenkins.io fail
      • Accounts App
      • Processing account signup rejections
      • Deleting spammers' accounts
      • Kubernetes
      • Where does it live
      • how do you know it's healthy
      • what do you do if a service (account-app, plugins, etc) are not working properly.
      • How do you manually renew a Letsencrypt certificate for a Kubernetes-based application
      • would perhaps be interesting to know what's not documented? known unknowns
      • Document black boxes from the infra, e.g. the release process in KK's basement
      • Legacy services which are "not managed"
      • jenkins.ci.cloudbees.com
      • Any "tyler-only" jobs?
         
         

        Attachments

          Issue Links

            Activity

            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            R. Tyler Croy Olivier Vernin I will take over this EPIC if you do not mind

            Show
            oleg_nenashev Oleg Nenashev added a comment - R. Tyler Croy Olivier Vernin I will take over this EPIC if you do not mind
            Hide
            olblak Olivier Vernin added a comment -

            Sure, the initial reason for this epic was to do knowledge transfer from R. Tyler Croy to me and document it.

            We maintain a private documentation only available to a subset of people jenkins-infra/runbooks

            Show
            olblak Olivier Vernin added a comment - Sure, the initial reason for this epic was to do knowledge transfer from R. Tyler Croy to me and document it. We maintain a private documentation only available to a subset of people jenkins-infra/runbooks

              People

              • Assignee:
                oleg_nenashev Oleg Nenashev
                Reporter:
                rtyler R. Tyler Croy
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: