Azure VM Resizing Automation – How to validate if applications are running and whether VM must be deallocated before resizing?

Question

Azure VM Resizing Automation – How to validate if applications are running and whether VM must be deallocated before resizing?

SHIJO BLESSWIN 20

I am implementing an automation workflow for resizing Virtual Machines in Azure and want to ensure the process is safe for production workloads.

Before performing the resize operation, I want to validate whether any application is actively running inside the VM. My goal is to avoid resizing a VM that is currently serving traffic or running critical workloads.

I would like guidance on the recommended validations and steps for implementing safe VM resizing automation. Specifically, I have the following questions:
1. What is the recommended way to detect whether an application is actively running inside a VM before performing a resize operation?
2. Is it sufficient to check VM performance metrics such as CPU utilization, memory usage, network activity, and disk activity to determine if the VM is idle?
3. Are there any Azure-native services or APIs that help identify application activity inside a VM for automation workflows?
4. For resizing a VM using automation, is it mandatory for the VM to be in the deallocated state, or is stopped state sufficient?
5. What are the recommended validation steps to include in a VM resizing automation workflow to ensure it is safe?
Example validations I am considering:
- CPU utilization threshold check
- Memory utilization threshold check
- Network traffic activity
- Disk read/write activity
- VM power state validation

2 answers

Your answer

Answer 1

Hello Shijo,

When preparing to resize an Azure VM, you can reduce risk by validating VM activity and understanding resize behavior using Azure-native tools however it'll be recommended to test the automation before on a UAT/test VM so that it doesn't impact the production VM environment with the new untested automation.

Regarding #1 & #2 Detecting activity inside the VM and metrics are enough to determine VM idleness:

To be honest it's tough to pin-point, if any application is running from Azure Infrastructure level as the application runs inside the OS, if it's a CPU intensive or causes Networking inbound or outbound activity or if it's bound to involve a lot of disk operation then you can track the application running status trusting the Azure in-build metrics like CPU percentage or Disk IOPS or Network Inbound or outbound.

If you are aware what process the application is bound to invoke on the OS level then you can use "Run Command" tool to invoke a command like "Get-Process" or "Get-Service" and from the output, your automation can decide if the underlying application is running or not.

Regarding #3,

You can use rest APIs programmatically to get the details for the VM (https://learn.microsoft.com/en-us/rest/api/compute/virtual-machines/get?view=rest-compute-2025-04-01&tabs=HTTP)

but as mentioned in above answer, it becomes difficult due to security concerns at Azure level to see what activity is being run inside the VM at the OS level. You can invoke a powershell command to get the details of a process and use it in a way for your automation.

Regarding #4,

It's not mandatory as such to Deallocate the VM first as part of resizing but as it's disruptive users prefer to shutdown the VM or Deallocate the VM first and then resize the VM.

Regarding #5,

I'd recommended somewhat rough automation as below:

Check VM power state
Query available resize options for the current cluster
Validate recent platform metrics are below defined thresholds
Perform resize (with deallocation if required) and restart the VM.

References:

SHIJO BLESSWIN 20 Reputation points

2026-03-12T07:27:54.3033333+00:00

In Azure VM resizing automation, how should threshold values be defined? Since applications running inside the VM cannot be accessed directly from the infrastructure level, how can we identify whether the VM is actively serving workloads, and what factors should be used to set the appropriate thresholds?
Ankit Yadav 12,205 Reputation points Microsoft External Staff Moderator

2026-03-12T17:46:57.0166667+00:00

Hey Shijo,

Looks like this query has been posted over here and answered already: https://learn.microsoft.com/en-us/answers/questions/5818227/azure-vm-resizing-automation-threshold-validation

Let me know if you still have any follow-up query from this case.

Answer 2

Resizing an Azure VM is a disruptive operation and must be treated as such in automation, especially for production workloads.

From the provided context, the following points are supported:

Detecting whether applications are actively running inside a VM

The context does not define an Azure-native, generic way to detect “application activity” inside a VM (for example, which processes or business apps are running). That detection is application-specific and must be implemented using guest-level monitoring or custom health checks.

However, Azure Advisor does use VM-level performance metrics (CPU, memory, disk IOPS, bandwidth) over time to decide when a VM is under- or over-sized, which is a useful pattern for automation.

Using performance metrics to infer activity/idle state

Azure Advisor’s “right-sizing” logic for highly utilized VMs is based on:

CPU utilization
Memory utilization
VM Cached IOPS Consumed Percentage
VM Uncached Bandwidth Consumed Percentage

It:

Aggregates metrics over a minimum of seven days.
Samples every 30 seconds, aggregates to 1 minute, then to 30 minutes.
Identifies resize candidates when:
- Both CPU and memory are ≥ 90% of current SKU limits, or
- Disk metrics are ≥ 95% of limits under specific conditions.

This shows that Azure’s own guidance for resize decisions is based on sustained utilization of CPU, memory, and storage bandwidth/IOPS, not on a single instantaneous snapshot.

For an “idle” check in automation, the same categories of metrics (CPU, memory, disk, network) are appropriate, but the context only confirms their use for identifying high utilization, not for guaranteeing that an application is idle. Any “idle” threshold remains a design decision for the workload owner.

Azure-native services/APIs for activity and configuration

From the context:

Azure Advisor: Provides recommendations to resize VMs based on sustained high utilization across CPU, memory, and disk metrics. This can be used as an input signal for resize automation when VMs are consistently constrained.
Change tracking and inventory using Azure Monitoring Agent: Tracks OS configuration drift, installed software, services/daemons, and key files on Azure VMs and Arc-enabled VMs. This is useful for understanding what is installed and running, but the context does not state that it directly exposes “application activity” or “request traffic” semantics.

No context is provided for an Azure-native API that directly tells whether a business application is actively serving traffic. That must be implemented via application health probes, logs, or custom metrics.

Whether the VM must be deallocated vs stopped for resize

The context distinguishes between VM states and resize behavior:

Resizing behavior:
- A VM can be resized while running or deallocated.
- In some cases, the VM must be deallocated before resizing, particularly when the requested size is not available on the current hardware cluster.
- Changing the size of a running VM causes a restart and is disruptive.
Stopped vs deallocated:
- “Stopped” (OS shutdown) keeps compute resources reserved and continues to incur compute charges.
- “Deallocated” (stopped via Azure control plane) releases compute resources and stops compute charges.

The resize article states:

After creation, a VM can be scaled up or down by changing the size.
This works whether the VM is running or already deallocated.
In some cases, deallocation is required if the new size is not available on the current hardware cluster.

From this, for automation:

It is not universally mandatory to deallocate a VM before resizing; Azure supports resizing a running VM (with restart) or a deallocated VM.
However, automation should be prepared for cases where deallocation is required to complete the resize, especially when changing to a size that may not be available on the current cluster.

The context does not state that a “stopped but not deallocated” state is sufficient or supported for resize; it only explicitly mentions running and deallocated states for resize operations.

Recommended validation steps for safe resize automation

Based on the context, the following validations and behaviors are supported and recommended for a safe automation workflow:

Treat resize as disruptive:
- Resizing a running VM causes a restart and should be considered disruptive, especially for stateful workloads.
- Automation should only proceed when the workload can tolerate a restart or when the VM is intentionally taken offline.
Power state validation:
- Confirm the VM power state before resize.
- Decide a policy:
  - Either resize while running (accepting a restart), or
  - Explicitly stop (deallocate) the VM via Azure APIs, then resize.
- Be aware that deallocation releases dynamic IP addresses; automation must handle IP changes if dynamic IPs are used.
Capacity and allocation considerations:
- If resizing within an availability set, capacity constraints on the original cluster can cause allocation failures.
- Workarounds include:
  - Choosing a different VM size with better availability.
  - Stopping (deallocating) all VMs in the availability set and starting them together to allow allocation from all available clusters.
- Automation should handle allocation failures gracefully and possibly fall back to alternative sizes or retry strategies.
Metric-based validation (patterned after Azure Advisor):
- Use Azure metrics for CPU, memory, disk IOPS, and bandwidth as inputs to resize decisions.
- For “scale up” decisions, follow the Azure Advisor pattern of looking at sustained high utilization over time.
- For “safe to resize now” checks, the same metrics can be used to ensure utilization is below chosen thresholds, but the context does not prescribe specific “idle” thresholds.
Cost and state management:
- Use deallocation when VMs are not needed to avoid compute charges, as recommended in FinOps best practices.
- Ensure automation differentiates between “stopped” (OS-level) and “deallocated” (Azure control plane) and uses the appropriate API to achieve the desired state.

Summary relative to the example validations:

CPU utilization threshold check: Supported as a key signal (used by Azure Advisor for resize recommendations).
Memory utilization threshold check: Supported and used by Azure Advisor.
Network traffic activity: Not explicitly called out in the context for resize decisions, but is a reasonable additional signal; not mandated by the documentation.
Disk read/write activity: Supported via disk IOPS and bandwidth metrics, used by Azure Advisor.
VM power state validation: Supported and important; resize is disruptive and may require deallocation depending on size and cluster capacity.

The context does not define a single “recommended” full workflow for safe resize automation, but it clearly establishes that:

Resizing is disruptive and may require deallocation.
VM metrics (CPU, memory, disk) are the primary signals Azure uses for resize recommendations.
Stopped vs deallocated states have different billing and resource implications, and deallocation is sometimes required for resize.
Capacity and allocation behavior in availability sets must be considered when resizing.

References:

Share via

Azure VM Resizing Automation – How to validate if applications are running and whether VM must be deallocated before resizing?

2 answers

Your answer