I’ve been experimenting with an Action that can spawn > 30 jobs in parallel; it’s not uncommon for one of these to fail for no obvious reasons (e.g. when the Action is installing tools it needs in the base VMs).
I’m not too concerned about the flaky failures, but it would be nice to be able to recover from them more cleanly.
(1) Is there a way (current or planned) to allow for a job to restart itself upon failure? (Obviously this would need to be conditionalized to unexpected failures, and retry limits would be needed)
(2) Currently there only seems to be a “Restart All Jobs” option in the UI, and that only appears once all jobs in an action are finished (either by success/failure/cancellation). Is there going to be a way to selectively restart failed jobs, to avoid wasteful re-runs?