Prerequisite: 07 — Nav2 Localization Odom Amcl Ekf, 06 — Nav2 Local Control And Cmdvel, 03 — Nav2 Bt Navigator And Bt Xml Unlocks: Safer recovery policy, fewer opaque mission failures, cleaner retry budgets, better differentiation between temporary blockage and real navigation breakdown
When navigation fails in production, the most important question is not “can the robot try something?” It is “what should the robot try next, how many times, and when should it stop?”
Bad recovery design creates recognizable AMR failures:
Recoveries and checkers are where technical navigation meets operational policy.
A recovery is a deliberate policy decision that says:
the normal plan-follow loop is not succeeding
try a bounded alternative behavior to regain progress or clarity
Common Nav2 recovery behaviors include:
Those are not equivalent. Each assumes a different story about why navigation failed.
| Suspected failure story | Good first recovery bias | Bad first recovery bias |
|---|---|---|
| temporary human or forklift obstruction | wait, maybe replan | aggressive backup or repeated spin |
| stale or phantom obstacle in costmap | clear relevant costmap, then replan | repeated wait with no model refresh |
| robot nose trapped in tight geometry | short backup, then replan | repeated clear-only loop |
| localization confusion | escalate or relocalize workflow outside default recoveries | endless spin/clear cycles |
If the recovery assumes the wrong story, the robot wastes time and operator trust.
The number of retries is not just a technical knob.
More retries mean:
Fewer retries mean:
That tradeoff depends on aisle width, human traffic, mission urgency, and operator expectations.
A progress checker asks a simple but operationally critical question:
has the robot made enough real progress within enough time to justify continuing?
That usually boils down to thresholds such as:
Those values sound simple, but they are heavily dependent on the robot and its motion stack.
A progress checker can fire even when the controller is behaving rationally.
Typical causes:
If the checker says “stuck,” prove whether the robot was actually unable to move or only unable to satisfy the chosen threshold.
The same thresholds rarely fit all scenarios.
| Scenario | Risk if threshold too strict | Risk if threshold too loose |
|---|---|---|
| normal aisle following | false stuck detection | slow recognition of real failures |
| dense human environment | constant needless recoveries | robot waits too long in congestion |
| docking or final alignment | aborts during valid inching motion | masks real low-speed deadlock |
This is why high-quality AMR systems often treat docking and general navigation differently at the mission layer.
A goal checker determines whether the robot has satisfied positional and angular tolerances strongly enough to report success.
This matters because success triggers downstream actions:
If success is declared too early, the robot may be operationally in the wrong place even though Nav2 says done.
Loose tolerances can help throughput in coarse navigation tasks.
Tight tolerances matter for:
Bad pattern:
That is not fixing the root cause. It is shifting the failure downstream.
Some goal-checking behavior uses state to avoid flapping once tolerances are satisfied.
That can help stability, but it can also hide situations where the robot briefly enters tolerance and then drifts away.
Use it intentionally, especially if a downstream workflow assumes precise final pose.
This makes sense when the world model may be stale or polluted.
Good use cases:
Bad use cases:
If costmap clearing works repeatedly, do not celebrate. Ask why the stale obstacle keeps returning.
Waiting is underused in human-heavy or forklift-heavy environments.
It is often the safest first move when:
Waiting is bad when the robot is geometrically trapped or the world model itself is wrong.
Backing up is useful when the robot needs space to replan or re-enter a corridor.
It is risky when:
In production AMRs, backup distance and conditions are policy decisions, not just defaults.
Spin can help with sensor coverage and local environment refresh.
It can also be a throughput killer in narrow aisles.
If the robot repeatedly spins in a place where turning radius is operationally awkward, the policy likely belongs at the BT or mission layer, not in more retries.
Recommended bias:
Why:
Recommended bias:
Do not let costmap clear become a permanent band-aid for perception integration defects.
Recommended bias:
This is the kind of incident where backup makes more sense than waiting because geometry, not traffic, is the main issue.
Recommended bias:
Default recoveries are often too generic for precise station work.
If the robot cannot settle because pose estimate is noisy, loosening the goal checker may only move the failure to docking, manipulation, or task completion.
Treat loose tolerances as a task-level decision, not a universal fix.
If the base ignores small commands, increasing movement time allowance may reduce false aborts but it does not repair the command-chain mismatch.
First prove that commanded low-speed motion is physically executable.
Always validate on:
If one threshold set only works in one case, document the limitation instead of pretending it is universal.
Escalate when:
Nav2 should not carry all mission semantics alone.
Useful signals to expose upward:
This is what lets the mission layer make informed next-step decisions.
Strong answers explain that recoveries are:
That is what distinguishes production AMR policy from demo navigation.
Continue to 09 — Nav2 Waypoints Docking Zones And Missions. That lesson explains how waypoint flows, docking, costmap filters, and higher-level mission logic sit above these local retry and success contracts.