Observability  sandbox 

Observability, what, why and how?

What is Observability?

Ability to understand the health of a system by observing it’s key signals. For humans, the most basic signal is a heartbeat.

Signals can vary from system to sytem. For a high traffic online store, the latency users experience, or the time it takes to confirm a purchase, can be a key signal. For a batch workload, the amount of records it processes over a specific duration can be a key signal.

Instrumenting the system is a key component of Observability.

Metrics

Prometheus

Thanos

Logs

Use structured logging. Plenty of language specific libraries available to make it easy. Evaluating an app’s logging during performance tests can provide some actionable improvements - too many, or too few entries, swallows important info, wrong classification of severity etc.

Loki

Searches through labels describing the source of the logs rather than the log extry itself (free text search)

  • Avoid high cardinality labels, best practices is a good place to start
  • Add high cardinality fields (ex - timestamp, transactionid) into structured metadata rather than labels. Promtail, Alloy can do this while scraping the logs

Visualization

Grafana

Alerts

Avoid alert fatigue! No action, no need for an alert

Alert Manager