As the application stack moves from hardware-centric to cloud-centric and now to container-centric infrastructure, much is changing. Processing power is on the rise with more powerful compute instances like GPUs becoming more accessible, and compute and data storage costs are on the decline as the cloud makes resources cheaper. The application stack is becoming more complex and dynamic. With all these changes, monitoring has become more important than ever. The only way to operate in a world of distributed apps, teams, infrastructure, and cloud tooling is to monitor the entire stack end-to-end.
Monitoring used to be limited to a single tool like Nagios. These tools monitored closed systems with components that don't change regularly, and with relatively few metrics to keep track of. Then came tools like New Relic which covered application performance. Slowly, as more tools were added to the mix, monitoring was taking on a best-of-breed approach. Integrations were becoming important as data is more valuable when viewed in different contexts and using multiple tools. And then Docker happened, and the world of software delivery has never been the same.
With the advent of containers, the importance of monitoring and the number of monitoring tools required shot up overnight. Containers enable the microservices approach to application architecture, which means that in order to monitor the entire application, you need to monitor tens or hundreds of microservices individually, and in context with every other microservice.
Let's look at the various types of monitoring tools required to gain visibility across the entire development pipeline today.
Today, infrastructure is predominantly cloud-based, and the tools to monitor cloud instances are similarly cloud-based. Typically, cloud vendors like AWS provide robust monitoring tools to monitor their resources. AWS CloudWatch and CloudTrail are essential tools if you're deeply invested in the Amazon ecosystem. You could either use these same tools to monitor your on-premises instances or have separate tools for that, depending on which cloud vendor's tools you use. Nagios would fall into this category, and though considered outdated today, it is still widely used for old-school server monitoring.
Application Performance Monitoring
With web apps being predominantly about page loads, APM focuses on metrics like page load times, response times, file size of elements on the page, page load errors, and traffic load. The strength of an APM tool is in how it can integrate the various numbers so the user can gain deeper context. Data visualization, and interactive features like drilldowns, filtering, zooming, and labeling are essential to a successful APM experience. The leading tools like New Relic and AppDynamics offer mature APM platforms.
Logs are essential for troubleshooting as they present data that's close to the source of truth and are full of context required to resolve issues. Logging tools need to crunch numbers as a basic necessity, but they also need to be adept at full-text analysis. Much of the log data is in the form of semi-structured text, and mining insight from this data is key to success with logging. This is why open source tools like Elasticsearch have gained popularity for logging.
Thus far, we've discussed traditional monitoring methods that are still vital for monitoring modern cloud-native applications despite having been practiced for over a decade. Now, we jump to more recent trends in monitoring. Much of this has to do with the rise of Docker containers and container orchestration platforms like Kubernetes.
Containers need to be isolated from each other to remain secure in production. This requires access controls and communication protocols that follow the principle of least privilege. Containers should be allowed access to resources they need to complete tasks they're assigned—no less and no more. Similarly, in a microservices system, each service should be allowed to communicate with other services only for the purposes of executing a job. In a service mesh where every service can communicate with every other service, this level of control is achieved by using policies that are enforced on services. This is why Kubernetes networking tools like Linkerd, Istio, and Calico operate on a policy-based networking model.
This new way of managing resources is set up differently, and is monitored differently. Tools like Prometheus excel at monitoring container systems. They integrate with platforms deeply, and can automatically discover changes occuring in the system in real time.
At the heart of a distributed cloud-native application are the APIs it uses for communication, both internally between all its services, and externally with other applications. APIs operate at every level of the application stack, and monitoring API calls brings deep visibility into the health of the system, performance of the application, and the user experience.
API monitoring is the quickest way to track downtime, as endpoint failures can be reported in real time. API monitoring can be passive when you look at the aggregate data over time and monitor how your application is tracking against SLAs. It can also be proactive where you run functional tests at vital parts of the system to routinely check if mission-critical operations are running as expected. Alternately, you can leverage API metrics to triage and troubleshoot issues in real time, or even better, to look for anomalies and trends in order to catch issues well before they escalate.
API monitoring enables all these monitoring use cases because of its real-time attributes, but also because it is very specific in pinpointing the type of errors, the location of the errors, and the extent of damage being caused. When it comes to user experience, identifying issues and resolving them before customers or the support team finds out is the holy grail, and API monitoring tools like Runscope are your best bet to achieve this level of responsiveness.
Monitoring has changed for good, and for the better. Today's applications can deliver outstanding user experiences, but they don't just happen—they require careful planning and strategy to tie together a powerful toolkit of monitoring tools so you can gain visibility across the breadth and depth of your system. The traditional approaches like infrastructure monitoring, APM, and log analysis are still relevant, but they're not enough. You need to leverage container-specific monitoring and API monitoring which are by-products of today's distributed application architecture to deliver an outstanding user experience.