Is there a way to automatically retry failed stages in an Azure DevOps YAML pipeline?

I have a multi-stage YAML pipeline, and sometimes a stage fails due to transient issues (e.g., network timeouts, flaky tests, etc.). Instead of manually re-running the stage, I want to configure an automatic retry mechanism for failed stages in same build.

Approach Tried: I attempted to implement a monitoring stage that tracks and retries failed stages dynamically. This stage consists of two tasks:

Check the status of each stage using an API call and store the results in a variable.
Rerun failed stages based on the stored statuses.

However, when calling the API to get stage statuses, I encountered the following issue:

"Stage 'Build' is still running or undetermined. Skipping..."

Additionally, I have tried using the YAML property retryCountOnTaskFailure, but it is not a viable solution in our case. Our pipeline Provisions a Kubernetes cluster, Onboards it to Azure Arc and Runs Sonobuoy tests for various extension plugins.

Using retryCountOnTaskFailure causes resource leaks because retries leave many resources undeleted, leading to inconsistencies.

Questions: How can I reliably fetch the status of completed or failed stages in a YAML pipeline?

What is the best way to dynamically rerun only the failed stages while ensuring proper resource cleanup?

Approach Tried: I attempted to implement a monitoring stage that tracks and retries failed stages dynamically. This stage consists of two tasks:

Check the status of each stage using an API call and store the results in a variable.
Rerun failed stages based on the stored statuses.

However, when calling the API to get stage statuses, I encountered the following issue:

"Stage 'Build' is still running or undetermined. Skipping..."

Using retryCountOnTaskFailure causes resource leaks because retries leave many resources undeleted, leading to inconsistencies.

Questions: How can I reliably fetch the status of completed or failed stages in a YAML pipeline?

What is the best way to dynamically rerun only the failed stages while ensuring proper resource cleanup?

Share Improve this question edited Mar 31 at 10:08 Mark Rotteveel 110k229 gold badges156 silver badges224 bronze badges asked Mar 28 at 11:49 user30090146 1

Can you share more details on how you're obtaining the status of the stages? – bryanbcook Commented Mar 28 at 15:34
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Bot Commented Mar 31 at 19:00

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Currently, Azure DevOps does not provide an out-of-the-box feature to automatically retry failed stages in a YAML pipeline. If this is a critical need for your workflow, you can submit a feature request to Azure DevOps Product Group.

The workaround as of now is to use a separate monitoring pipeline. Since the REST API to rerun stages requires the build to be completed, we cannot rerun failed stages within the same pipeline execution. Instead, we can use another pipeline to monitor failed build events and trigger reruns. Here’s how:

Create a Service Hook;
- Configure an Azure DevOps Service Hook to send a payload when a build fails;
- Use a WebHook URL like: https://dev.azure/<YourADOOrgName>/_apis/public/distributedtask/webhooks/<WebHookName>?api-version=6.0-preview;
Setup an Incoming WebHook Service Connection;
- Use the <WebHookName> in the URL to create an Incoming WebHook service connection;

Create a YAML Pipeline to Listen for Failures

This pipeline will be triggered by the WebHook, extract the failed build URL from the payload, fetch failed stages and rerun them via APIs.

trigger: none

resources:
  webhooks:
  - webhook: adowebhookrerun # WebHook name
    connection: WebHookSvcCnnRerun # Incoming WebHook service connection name

variables:
  payload: ${{convertToJson(parameters.adowebhookrerun)}}
  failedBuildURL: ${{parameters.adowebhookrerun.resource.url}}

pool:
  vmImage: windows-latest

steps:
- pwsh: |
    Write-Host "================ Failed build payload: ================"
    Write-Host '$(payload)'
    Write-Host "================ Failed build URL: ================"
    Write-Host '$(failedBuildURL)'

    # Set $(System.AccessToken) of pipeline service account for API authentication
    $headers = @{
    'Authorization' = 'Bearer ' + '$(System.AccessToken)'
    'Content-Type' = 'application/json'
    }
    $timeline = Invoke-RestMethod -Uri "$(failedBuildURL)/timeline?api-version=7.1" -Headers $headers -Method Get
    $failedStages = $timeline.records | Where-Object { $_.result -eq "failed" -and $_.type -eq "Stage"}
    foreach ($stage in $failedStages) {
       $body = @{
          "state" = "retry"
          "forceRetryAllJobs" = $true
          "retryDependencies" = $true
      } | ConvertTo-Json -Depth 10
      Write-Host "================ Rerun failed stage: $($stage.name) ($($stage.identifier)) ================"
      Invoke-RestMethod -Method Patch -Uri "$(failedBuildURL)/stages/$($stage.identifier)?api-version=7.1" -Headers $headers -Body $body
    }

Since the script authenticates against $(System.AccessToken), ensure that either or both of the following pipeline service accounts have permission to Queue builds:

Project Collection Build Service (<YourADOOrgname>)
<TheProjectName> Build Service (<ADOOrgName>)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Is there a way to automatically retry failed stages in an Azure DevOps YAML pipeline? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)