Retry pipeline and/or task on failure

I use the Python SDK to develop ML pipelines for Azure ML.

How do I get my PythonScriptStep tasks or the encompassing Pipeline object to simply rerun upon failure?
I reckon it's pretty common for pipelines to temporarily break upon temporary network, storage, etc. issues so a simple rerun / retry seems pretty basic for task orchestration frameworks to provide (see e.g. Apache Airflow).

I've spent a fair amount of time going over the documentation for Azure ML and I just can't figure out how to get "retry upon failure" behaviour.

The closest there is is the continue_on_step_failure pipeline / task parameter which doesn't really do what's needed.

Any advice please?

I've tried finding a solution on SO over here - a proposed solution uses external tools which just adds more overhead:

https://stackoverflow.com/questions/68647922/azure-machine-learning-pipeline-how-to-retry-upon-failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry pipeline and/or task on failure #1572

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Retry pipeline and/or task on failure #1572

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions