Monitoring serverless apps
When clients have a bug, you can tell. Go through the flow, click around, see what happens.
But code on the server is invisible. And always broken. A distributed system is never 100% error-free.
A good architecture lets you ignore many errors. The system recovers on its own.
What about the bad errors? And how do you debug code you can't see?
Observability
Observability is the art of understanding the internal state of a system based on its outputs. It's a continuous process.
A good system lets you:
- understand what's going on
- see trends
- figure out what happened after an error
- predict errors
- know when there's an emergency
- understand how to fix an emergency
Those are design goals. There is no right answer. Observability is an art and getting it right takes practice.
But there are guidelines you can follow. You'll need:
- logs are immutable events that happened in your system. They follow a structured format and offer information about what happened where and when.
- metrics are aggregate events over time. They tell you how much of what is happening, how long it takes, and help you understand trends.
- traces are journeys through the system. A sequence of events that contributed to a bigger result.