Monitor your servers CPU usage using Azure Log Analytics

URL has been copied successfully!

Reading Time: 4 minutes

At one of my Azure user groups, someone asked me about monitoring servers using Azure. So I thought perhaps more people would like to know this too, hence this blog post. This post will focus on monitoring CPU usage. I will go through the query I use and will show you how to pin it to a dashboard, but first, we have some prerequisites to get out of the way first.

The prerequisites

You will need to have the Azure monitor agent installed on the VM’s you want to monitor. You can even monitor none Azure servers too. You can find more about installing the agent at https://docs.microsoft.com/en-us/azure/azure-monitor/platform/log-analytics-agent#install-and-configure-agent

You will also need to enable extra data sources. For this guide, it will be performance counters. You can read more about this at https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-sources-performance-counters. Basically, if you have not enabled performance counters in you Log Analytics Workspace you will need to. To do this go to your Log Analytics Workspace click Advanced Settings, then click Data, now click Windows Performance Counters, and finally click Add the selected performance counters.

Now that you have that out of the way, lets get to it.

The Query

// Chart CPU if its under nn% over the past nn days/hours
let setpctValue = 85;
// enter a % value to check
let startDate = ago(5h);
// enter how many days/hours to look back on
Perf
| where TimeGenerated > startDate
| where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and Computer in ((Heartbeat
| where OSType == "Windows"
| distinct Computer))
| summarize PCT95CPUPercentTime = percentile(CounterValue, 95) by Computer
| where PCT95CPUPercentTime > setpctValue
| summarize max(PCT95CPUPercentTime) by Computer
| join
(
Perf
| where TimeGenerated > startDate
| where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and Computer in ((Heartbeat
| where OSType == "Windows"
| distinct Computer))
)
on Computer
| make-series PCT95CPUPercentTime = percentile(CounterValue, 95) on TimeGenerated from ago( 5h ) to now() step 10m by Computer
| render timechart
Code language: JavaScript (javascript)

Explaining the query

So this bit:

// Chart CPU if its under nn% over the past nn days/hours
let setpctValue = 85;
// enter a % value to check
let startDate = ago(5h);
// enter how many days/hours to look back onCode language: JavaScript (javascript)

Is the only bits you will need to change. Let setpctValue is the % value you want to set as a threshold. So, in this case, I am using 85%. Let startDate is how many hours you want the chart over. I have used 5 Hours. You could choose the whole working day of 7h+30m if you like. Totally up to you.

Perf
| where TimeGenerated > startDate
| where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and Computer in ((Heartbeat
| where OSType == "Windows"
| distinct Computer))
| summarize PCT95CPUPercentTime = percentile(CounterValue, 95) by Computer
| where PCT95CPUPercentTime > setpctValue
| summarize max(PCT95CPUPercentTime) by ComputerCode language: JavaScript (javascript)

Here you do our initial query that looks at the performance counter % Processor Time for the instance Total. You have set it to only query servers with the OS type windows and only when the CPU is over the percentage you set before.

| join
(
Perf
| where TimeGenerated > startDate
| where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total" and Computer in ((Heartbeat
| where OSType == "Windows"
| distinct Computer))
)
on Computer
| make-series PCT95CPUPercentTime = percentile(CounterValue, 95) on TimeGenerated from startDate to now() step 10m by Computer
| render timechart
Code language: JavaScript (javascript)

You then do a join on the initial query with this query, which is basically the same, but without the summarize as it’s not needed now. You then make a series of the data on the time generated from the time you set above to now (as in when the query runs) and step it by 10 minutes. You could change the step if you like, I found 10 minutes to be a good fit for me. The last line then renders a nice chart for you.

Running the query

If you run the query in your log analytics log window and have servers over 85% CPU usage you should see something like this.

You will notice that even though the server is not currently over the percentage it is still showing in the graph. This is by design to help you look for issues or patterns over the time period set.

Dashboards?

It’s is very easy to pin this chart to a dashboard, in fact, all you have to do is click the Pin button at the top right of the query window. This will then ask you what dashboard to save the Chart to. One thing to note is you will have needed to create a dashboard and share it first. Unfortunately, you are unable to create a dashboard from the pin drop down.

There you have it you can now monitor both Azure and non-Azure servers CPU usage using Azure Log Analytics and Azure Dashboards.

I hope you found this article helpful. If you have any questions or comments please reach out in the usual ways.

Greg · June 19, 2019 at 1:10 pm

Cheers Richard, useful post! Im currently looking into this myself combined with a grafana dashboard. Do you have queries for server storage by anychance?

Pixel Robots. · June 20, 2019 at 7:59 am

Hi Greg. Have a look at https://pixelrobots.co.uk/2019/01/azure-alert-on-server-disk-space-below-xxgb/ it might help.

Greg · June 21, 2019 at 10:18 am

Perfect mate, cheers!

Robenildo Oliveira · July 30, 2020 at 3:33 pm

Hi richard,

Congratulations for you blog, time and advanced support.

Cheers,
Robenildo Oliveira

Monitor your servers CPU usage using Azure Log Analytics

Published by Pixel Robots. on June 19, 2019 June 24, 2019

The prerequisites

The Query

Explaining the query

Running the query

Dashboards?

Pixel Robots.

4 Comments

Greg · June 19, 2019 at 1:10 pm

Pixel Robots. · June 20, 2019 at 7:59 am

Greg · June 21, 2019 at 10:18 am

Robenildo Oliveira · July 30, 2020 at 3:33 pm

Leave a Reply Cancel reply

AKS managedNATGatewayV2: Zone-Redundant Egress Now in Preview

Container Network Insight Agent for AKS is Now in Public Preview

AKS Cluster Health Monitor Now in Preview

Monitor your servers CPU usage using Azure Log Analytics

Published by Pixel Robots. on June 19, 2019 June 24, 2019

The prerequisites

The Query

Explaining the query

Running the query

Dashboards?

Pixel Robots.

4 Comments

Greg · June 19, 2019 at 1:10 pm

Pixel Robots. · June 20, 2019 at 7:59 am

Greg · June 21, 2019 at 10:18 am

Robenildo Oliveira · July 30, 2020 at 3:33 pm

Leave a Reply Cancel reply

Related Posts

AKS managedNATGatewayV2: Zone-Redundant Egress Now in Preview

Container Network Insight Agent for AKS is Now in Public Preview

AKS Cluster Health Monitor Now in Preview