Managing today’s complex, dynamic IT infrastructures isn’t always easy. It requires a powerful set of capabilities to give you the real-time visibility you need to resolve issues quickly, scale rapidly, and deploy intelligently.
To help make that clear to anyone considering or evaluating dynamic infrastructure monitoring solutions, we have assembled 28 critical things any effective solution must provide. To make the list easier to grasp, we’ve broken it down into six categories, from alerting to native integrations, and annotated each one with an analysis of what to look for in each area.
Category 1: Change-management visibility
- The dynamic infrastructure monitoring solution must be able to inventory the configuration state of each server.
- The solution must be able to track common configuration changes to servers historically and as they occur.
- The solution must be able to correlate common configuration changes with their impact on the applications hosted on the server.
- The solution must be able to provide a timeline view of common configuration changes.
- The solution must be able to report packages installed at the server level and report any inconsistencies or abnormalities.
Configuration changes are always risky. Companies can try to minimize this risk with testing and peer review, but you can never completely eliminate the chance that something will go wrong during this activity. Research suggests that the vast majority of major IT service outages are due to people and process problems, often resulting from coordination issues amongst change, release, and configuration management. Clearly a robust monitoring tool must allow you to not only view changes over time, but quickly correlate those changes to potential impacts to your host and application performance.
Category 2: Data collection and analysis
- The dynamic infrastructure monitoring solution must be able to collect host metrics (CPU, memory, load, disk, network, process, etc.) with at least a 5-second granularity.
- The solution must let users dynamically filter metrics and correlate data sets together for deep analysis.
- The solution must provide a mechanism to tag infrastructure components with automated tags (hostname, etc.) and also support the ability to specify custom tags.
- The solution must let users apply tags on every metric set to custom filter the data shown.
- The solution must store data for more than a year to enable long-term trending, reporting, and analysis.
- The solution must provide dynamic dashboards and support the ability to customize dashboards.
- The solution must be able to dashboard infrastructure data in the same view as application and third-party data.
- The solution must be able to correlate infrastructure resource utilization hotspots with application performance, and assist in determining the true impact on hosted applications.
- The solution must allow end users to specify the reporting period with common wording (“Since last week”) as well as with specified times (12 p.m.).
Many tools can collect basic stats on server resource utilization. But in today’s dynamically changing host environments that include on-premise, cloud, and hybrid deployment models, it’s critical to be able to analyze that data efficiently along with other events and metrics that can change minute to minute.
It’s not enough just to collect the data; instead, a modern infrastructure monitoring tool should be easy to use and provide up-to-date information, so that you can use the data to improve the health and performance of the environment. Advanced features like dynamic tagging and dashboards help bring relevant information into view so that potential problems can be fixed before they affect end users.
Category 3: Native integrations
- The dynamic infrastructure monitoring solution must be fully integrated with a proven, enterprise-ready APM solution that supports frontend, backend, synthetic, and mobile monitoring.
- The solution must be able to support integration with Amazon Web Services without requiring an additional component within AWS.
- The solution must be able to retrieve data from key AWS services and report AWS performance alongside APM and infrastructure data.
IT infrastructure is not an end in itself. Infrastructure exists to support the applications that run on top of it. If you can’t see the impact your hosts have on the applications they support, it’s like packing for a trip without knowing where you are going, how you are going to get there, or what you will do when you arrive. A modern infrastructure monitoring tool can’t live by itself, but rather must stand alongside application, browser, mobile, and cloud tools to give you a more complete picture that helps you fix problems faster.
Category 4: Management and ease-of-use
- The dynamic infrastructure monitoring solution must be 100% software-as-a-service (SaaS), requiring no additional hardware or management software to be deployed within the data center.
- The solution must employ security best practices and be able to demonstrate sound security measures with the appropriate security documentation and certifications.
- The solution must be dynamically scalable and able to support environments from 1 host to more than 100,000 hosts without additional configuration.
- The solution must be able to support custom CA certificates to allow inspection of all traffic between the agent and the vendor.
- The solution must be able to run concurrently with other monitoring and non-monitoring packages on each host.
Systems that are dynamically scalable and secure are preferred over systems that require additional space, effort, time, and money to support and maintain. SaaS tools deliver on both of these needs and provide a small installation footprint to minimize the impact on other applications.
Category 5: Container monitoring
- The dynamic infrastructure monitoring solution must support Docker and other key container technologies. It must be able to collect metric data on containers without additional modules or software installations.
- The solution must support all major container orchestration solutions without additional required software.
- The solution must be able to track container performance by image name, image version, container ID, and user-defined tags.
Container technology has changed the way many development teams test and deploy their applications. Like cloud and virtualization technologies, the capability to monitor the health of a container ecosystem in addition to the underlying host should be considered essential for DevOps organizations. When thousands of containers are deployed, the ability to quickly home in on a particular container or group of containers can vastly simplify reporting and analysis activities.
Category 6: Alerting
- The dynamic infrastructure monitoring solution must let users create alerts directly from data charts on demand.
- The solution must be able to report and alert based off already-existing AWS tags.
- The solution must support tag-driven alerting and allow new servers to be added to alerts based on rules and patterns without human intervention.
When something goes wrong—or just appears out of the ordinary—you need to know right away. In modern environments, alerting can be complicated by cloud instances and containers that spin up and down on an hour-by-hour basis. To deal with the dynamic nature of these systems, your monitoring tool should be able to automatically add—and remove—alerts for new systems as they are added or removed from your host environments. This eliminates the need to spend valuable time individually configuring alerts for each of your systems.
To learn more about the power and promise of dynamic infrastructure monitoring, check out the home page for New Relic Infrastructure, read our blog posts on the importance of infrastructure monitoring, and get started today with a 30-day free trial to experience the power of full-stack visibility.