Dl/Product Release Checklist
From stonehomewiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Contents
Logging
- Your application should emit loggings that helps to diagnose your product.
- You should use a centralized logging system so it is easy to search logs from different service/component of your product.
- Your logging should not contain sensitive information or offensive words.
Metrics
- Your application should emit telemetrics (time series) to a metrics system (e.g. AWS Cloud Watch)
- You should have a centrailzed UI to watch telemetrics from different service/component of your product
Alarms
- You should define alarms based on your telemetrics
- The alarm should be able to notify your devop, for example, via pagerduty
Security
- Security vulnerability assessment
- Make sure your product does not have security vulnerabilities
- Access Control
- prevent unauthorized access to protected information
- access could be "read", "write", "delete", "list", etc.
- prevent unauthorized access to protected information
- Access Audit
- Make sure access to the product is tracked, tracked information should include:
- Who is accessing?
- What kind of access? (read/write/delete/list/etc...)
- When the access happened
- What has been accessed?
- access audit log should be organized in such way that is easy to search
- access audit log should be retained in reasonable time, also the retained duration should comply to government regulations.
- Make sure access to the product is tracked, tracked information should include:
- SSO Authentication
- Your Web UI should use SSO to authenticate user. An anti pattern is to have your product maintain it's own username/password, (e.g. current Airflow for Tier-1 and Tier-2)
- Having 4~5 products with each maintain their own username and password is a nightmare!
- Your Web UI should use SSO to authenticate user. An anti pattern is to have your product maintain it's own username/password, (e.g. current Airflow for Tier-1 and Tier-2)
Service Availability
- Highly Available
- Your service should be highly available. A common pattern is haing redundancy, so if your active server is down, your standby server can take over the control. And we expect the switch to be automatic.
Capacity
- You should deploy your service over the day-to-day capacity. For example, you should be prepared your service to handle 200% of traffic comparing your normal traffic.
- You should have "capacity review" constantly, a common practice is to review capacity every year, and book the capacity for the entire year (with predicted growth)
Beta Environment
- For any product, you should have a beta environment
- You should always deploy your change to beta environment first, verify nothing is broken before deploy to production.
User facing document
- Any product should have a user facing document.
- User facing document should be in sync with product evolvement
Design document
- Make sure document your design.
- Make sure your deisgn doc is in sync when you change your deisgn.
CICD Pipeline
- Your product's development environment should support CICD
- Any product using Python should have at least 80% of code coverage (line based, branch based)
Data Safety
- To prevent from physical data loss or logical data loss, you need to backup data periodically
- Certain percent of data lost is tolerable since backup does not happen continously.