Got a site to site Azure VPN gateway, trying to write KQL query that will alert when there is a microsoft downtime affecting VPN gateway, we had an application experience network issues during a VPN gateway resource health issue. Although with the following queries, I have not been able to get the time to tally up with when the application experienced network issues. Is there a way to fine tune the query to only return results when there is a genuine downtime with the VPN gateway. From my research when there is a failover, there is a disconnect/reconnect, so merely using the disconnect query will bring back a lot of false alerts.
This is what I have so far.
AzureDiagnostics
| where ResourceType == "VIRTUALNETWORKGATEWAYS"
//| where Category == "VpnGatewayDiagnosticLog"
//| where Level != "Informational"
| where Level in ("Error", "Critical") // Filter for errors
//| where TimeGenerated > ago(1h) // Adjust the time window as needed
//| project TimeGenerated, Resource, OperationName, ResultDescription, CorrelationId, Level
| order by TimeGenerated asc
and
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| where OperationName == "TunnelDisconnected"
//| extend Message1=Message
//| where Level in ("Error", "Critical") // Filter for errors
//| parse Message with * "Remote " RemoteIP ":" * "500: Local " LocalIP ":" * "500: " Message2
//| extend Event = iif(Message has "SESSION_ID",Message2,Message1)
//| project TimeGenerated, RemoteIP, LocalIP, Event, Level
| sort by TimeGenerated asc