Observability sandbox
What is Observability?
Ability to understand the health of a system by observing it’s key signals. For humans, the most basic signal is a heartbeat.
Signals can vary from system to sytem. For a high traffic online store, the latency users experience, or the time it takes to confirm a purchase, can be a key signal. For a batch workload, the amount of records it processes over a specific duration can be a key signal.
Instrumenting the system is a key component of Observability.
Metrics
Prometheus
Thanos
Logs
Use structured logging. Plenty of language specific libraries available to make it easy. Evaluating an app’s logging during performance tests can provide some actionable improvements - too many, or too few entries, swallows important info, wrong classification of severity etc.
Loki
Searches through labels describing the source of the logs rather than the log extry itself (free text search)
- Avoid high cardinality labels, best practices is a good place to start
- Add high cardinality fields (ex - timestamp, transactionid) into structured metadata rather than labels. Promtail, Alloy can do this while scraping the logs
Visualization
Grafana
Alerts
Avoid alert fatigue! No action, no need for an alert