Edit

Share via


Best practices for monitoring Kubernetes with Azure Monitor

Azure Monitor provides a set of services for monitoring the health and performance of your Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes clusters. Without proper configuration, your monitoring setup might miss critical issues or collect unnecessary data, leading to gaps in visibility or increased costs.

This article provides best practices based on the five pillars of architecture excellence in the Azure Well-Architected Framework to help you configure reliable, secure, and cost-effective monitoring for your Kubernetes clusters.

Reliability best practices for Kubernetes monitoring

Ensure the reliability of your Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes cluster monitoring by enabling Prometheus metrics, Container insights, control plane diagnostic settings, and alert rules. These recommendations help you detect and respond to failures in your cluster and its monitoring components.

Design checklist

  • Enable scraping of Prometheus metrics for your cluster.
  • Enable Container insights to collect logs and performance data from your cluster.
  • Create diagnostic settings to collect control plane logs for AKS clusters.
  • Enable recommended Prometheus alerts.
  • Ensure the availability of the Log Analytics workspace supporting Container insights.

Configuration recommendations

Recommendation Benefit
Enable scraping of Prometheus metrics for your cluster. Enable Prometheus on your cluster with Azure Monitor managed service for Prometheus if you don't already have a Prometheus environment. Use Azure Managed Grafana to analyze the collected Prometheus data. See Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus to collect additional metrics beyond the default configuration.
Enable collection of logs and performance data from your cluster. Container insights collects stdout/stderr logs, performance metrics, and Kubernetes events from each node in your cluster. It provides dashboards and reports for analyzing this data, including the availability of your nodes and other components. Use Log Analytics to identify any availability errors in your collected logs.
Create diagnostic settings to collect control plane logs for AKS clusters. AKS implements control planes logs as resource logs in Azure Monitor. Create a diagnostic setting to send these logs to your Log Analytics workspace so you can use log queries to identify errors and issues affecting availability.
Enable recommended Prometheus alerts. Alerts in Azure Monitor proactively notify you when it detects issues. Start with a set of recommended Prometheus alert rules that detect the most common availability and performance issues with your cluster. Consider adding log search alerts by using data collected by Container insights.
Ensure the availability of the Log Analytics workspace supporting Container insights. Container insights relies on a Log Analytics workspace. See Best practices for Azure Monitor Logs for recommendations to ensure the reliability of the workspace.

Security best practices for Kubernetes monitoring

Azure Monitor supports least privilege access and defense-in-depth for Kubernetes clusters. These security recommendations cover managed identity authentication, private link connectivity, network observability, and Log Analytics workspace security.

Connect clusters to Container insights by using managed identity authentication

Managed identity authentication is the default authentication method for new clusters. If you're using legacy authentication, migrate to managed identity to remove the certificate-based local authentication.

Instructions: Migrate to managed identity authentication

Azure managed service for Prometheus stores its data in an Azure Monitor workspace, which uses a public endpoint by default. Microsoft secures connections to public endpoints with end-to-end encryption. If you require a private endpoint, use Azure private link to allow your cluster to connect to the workspace through authorized private networks. You can also use private link to force workspace data ingestion through ExpressRoute or a VPN.

Instructions: See Enable private link for Kubernetes monitoring in Azure Monitor for details on configuring your cluster for private link. See Use private endpoints for Managed Prometheus and Azure Monitor workspace for details on querying your data by using private link.

Monitor network traffic to and from clusters by using traffic analytics

Traffic analytics analyzes Azure Network Watcher NSG flow logs to provide insights into traffic flow in your Azure cloud. Use this tool to ensure there's no data exfiltration for your cluster and to detect whether any unnecessary public IPs are exposed.

Enable network observability

Network observability add-on for AKS provides observability across multiple layers of the Kubernetes networking stack. Monitor and observe access between services in the cluster (east-west traffic).

Instructions: Set up Container Network Observability for Azure Kubernetes Service (AKS)

Secure your Log Analytics workspace

Container insights sends data to a Log Analytics workspace. Make sure to secure log ingestion and storage in your Log Analytics workspace.

Instructions: Log ingestion and storage.

Cost optimization for Kubernetes monitoring

Reduce your Azure Monitor costs for Kubernetes monitoring by optimizing data collection, configuring appropriate pricing tiers, and eliminating redundant metric collection. See Azure Monitor cost and usage to learn how Azure Monitor charges and how to view your monthly bill.

Note

See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.

Design checklist

  • Enable collection of metrics through the Azure Monitor managed service for Prometheus.
  • Configure agent collection to modify data collection in Container insights.
  • Modify settings for collection of metric data by Container insights.
  • Disable Container insights collection of metric data if you don't use the Container insights experience in the Azure portal.
  • If you don't query the container logs table regularly or use it for alerts, configure it as basic logs.
  • Limit collection of resource logs you don't need.
  • Use resource-specific logging for AKS resource logs and configure tables as basic logs.
  • Use OpenCost to collect details about your Kubernetes costs.

Configuration recommendations

Recommendation Benefit
Enable collection of metrics through the Azure Monitor managed service for Prometheus. Be sure you don't also send Prometheus metrics to a Log Analytics workspace. You can use Azure Monitor managed service for Prometheus to scrape Prometheus metrics from your cluster by enabling Managed Prometheus. You can also configure Container insights to collect Prometheus metrics in your Log Analytics workspace, but this approach isn't recommended because it's redundant with Managed Prometheus data and results in extra cost. For details, see Managed Prometheus pricing.
Configure agent to modify data collection in Container insights. Analyze the data collected by Container insights as described in Optimize monitoring costs for Container insights and adjust your configuration to stop collection of data you don't need.
Modify settings for collection of metric data by Container insights. See Enable cost optimization settings for details on modifying both the frequency of metric data collection and the namespaces that Container insights collects.
Disable Container insights collection of metric data if you don't use the Container insights experience in the Azure portal. Container insights collects many of the same metric values as Managed Prometheus. You can disable collection of these metrics by configuring Container insights to only collect Logs and events as described in Enable cost optimization settings in Container insights. This configuration disables the Container insights experience in the Azure portal, but you can use Grafana to visualize Prometheus metrics and Log Analytics to analyze log data collected by Container insights.
If you don't query the container logs table regularly or use it for alerts, configure it as basic logs. Convert your Container insights schema to ContainerLogV2, which is compatible with Basic logs and can provide significant cost savings as described in Optimize monitoring costs for Container insights.
Limit collection of resource logs you don't need. Control plane logs for AKS clusters are implemented as resource logs in Azure Monitor. Create a diagnostic setting to send this data to a Log Analytics workspace. See Collect control plane logs for AKS clusters for recommendations on which categories you should collect.
Use resource-specific logging for AKS resource logs and configure tables as basic logs. AKS supports either Azure diagnostics mode or resource-specific mode for resource logs. Specify resource logs to enable the option to configure the tables for basic logs, which provide a reduced ingestion charge for logs that you only occasionally query and don't use for alerting.
Use OpenCost to collect details about your Kubernetes costs. OpenCost is an open-source, vendor-neutral CNCF sandbox project for understanding your Kubernetes costs and supporting AKS cost visibility. It exports detailed cost data and customer-specific Azure pricing to Azure storage to help you analyze and categorize costs.

Operational excellence for Kubernetes monitoring

Streamline the operational management of your Kubernetes cluster monitoring with Azure Monitor. These recommendations cover monitoring guidance for all Kubernetes layers, Azure Arc integration for hybrid clusters, managed cloud-native tools, and automated policy-based data collection.

Design checklist

  • Review guidance for monitoring all layers of your Kubernetes environment.
  • Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure.
  • Use Azure managed services for cloud native tools.
  • Integrate AKS clusters into your existing monitoring tools.
  • Use Azure Policy to enable data collection from your Kubernetes cluster.

Configuration recommendations

Recommendation Benefit
Review guidance for monitoring all layers of your Kubernetes environment. Monitor your Kubernetes cluster performance with Container insights includes guidance and best practices for monitoring your entire Kubernetes environment from the network, cluster, and application layers.
Use Azure Arc-enabled Kubernetes to monitor your clusters outside of Azure. Azure Arc-enabled Kubernetes lets you monitor Kubernetes clusters running in other clouds by using the same tools as your AKS clusters, including Container insights and Azure Monitor managed service for Prometheus.
Use Azure managed services for cloud native tools. Azure Monitor managed service for Prometheus and Azure managed Grafana support all the features of the cloud native tools Prometheus and Grafana without having to operate their underlying infrastructure. You can quickly provision these tools and onboard your Kubernetes clusters with minimal overhead. These services let you access an extensive library of community rules and dashboards to monitor your Kubernetes environment.
Integrate AKS clusters into your existing monitoring tools. If you have an existing investment in Prometheus and Grafana, integrate your AKS clusters and Azure managed services into your existing environment by using the guidance in Monitor Kubernetes clusters using Azure services and cloud native tools.
Use Azure policy to enable data collection from your Kubernetes cluster. Use Azure Policy to enable Prometheus metrics, log collection, and diagnostic settings. This ensures that any new clusters are automatically monitored and enforces their monitoring configuration.

Performance efficiency for Kubernetes monitoring

Track Kubernetes cluster performance with Azure Monitor by enabling Prometheus metrics collection, Container insights, and performance alert rules. These recommendations help you identify performance bottlenecks and scale your clusters efficiently.

Design checklist

  • Enable collection of Prometheus metrics for your cluster.
  • Enable Container insights to track performance of your cluster.
  • Enable recommended Prometheus alerts.

Configuration recommendations

Recommendation Benefit
Enable collection of Prometheus metrics for your cluster. Prometheus is a cloud-native metrics solution from the Cloud Native Computing Foundation and the most common tool used for collecting and analyzing metric data from Kubernetes clusters. Enable Prometheus on your cluster with Azure Monitor managed service for Prometheus if you don't already have a Prometheus environment. Use Azure Managed Grafana to analyze the Prometheus data collected.

See Customize scraping of Prometheus metrics in Azure Monitor managed service for Prometheus to collect additional metrics beyond the default configuration.
Enable Container insights to track performance of your cluster. When you enable Container insights for your Kubernetes cluster, you can use views and workbooks to track the performance of the components of your cluster. This data might overlap with data collected by Prometheus. See Cost optimization for cost recommendations.
Enable recommended Prometheus alerts. Alerts in Azure Monitor proactively notify you when it detects issues. Start with a set of recommended Prometheus alert rules that detect the most common availability and performance issues with your cluster. Consider adding log search alerts by using data collected by Container insights.