04 — Nav2 Costmaps and Layers
How Nav2 turns maps and sensor observations into a navigable world model, and why AMRs fail when that model drifts from reality
Prerequisite: 01 — Nav2 System Architecture, 03 — Nav2 Bt Navigator And Bt Xml, 02 — Tf2 Time Qos
Unlocks: Costmap debugging, inflation tuning, obstacle-layer reasoning, safer aisle behavior, clearer root-cause analysis for “no path” and oscillation failures
Why Should I Care? (Context)
When engineers say “the planner is broken,” they are often really saying:
- the costmap contains an obstacle that is not physically there
- the costmap is missing an obstacle that is physically there
- the inflation settings make a passable aisle look blocked
- the robot footprint makes a legal route impossible in the current configuration
- unknown space policy does not match the operating environment
Costmaps are not just an implementation detail. They are Nav2’s working picture of the world. If that picture is wrong, every server above it becomes predictably wrong too.
PART 1 — WHAT A COSTMAP IS
1.1 A Costmap Is a Grid of Navigation Risk
At a high level, a costmap is a 2D grid where each cell describes how safe or costly it is for the robot to occupy that space.
free space -> easy to traverse
inflated space -> near obstacle, allowed but penalized
lethal obstacle -> not traversable
unknown -> depends on policy
This grid is the world model used by planning and control.
1.2 Global vs Local Costmaps
Global costmap
Used mainly by the global planner.
Characteristics:
- map-wide or large-area representation
- stable enough for route computation
- often includes static map plus selected obstacle information
Local costmap
Used mainly by the controller.
Characteristics:
- rolling window around the robot
- higher tactical sensitivity to recent sensor data
- used for immediate collision avoidance and short-horizon feasibility
Operational insight:
- If global costmap is wrong, you get bad routes or no routes.
- If local costmap is wrong, you get hesitation, oscillation, near-collision behavior, or excessive recoveries.
PART 2 — THE LAYER MODEL
2.1 Layers Build the Final Master Costmap
The final costmap is usually composed from multiple layers.
master costmap
= static layer
+ obstacle/voxel layer
+ inflation layer
+ optional semantic filters or custom layers
Each layer contributes information differently.
2.2 Static Layer
The static layer comes from the occupancy grid map.
It provides long-lived environmental structure such as:
- walls
- racks
- permanent machinery footprints
- mapped corridor geometry
This is the backbone of the global world model.
Failure mode: if the facility layout changed but the map did not, the static layer can make the planner confidently wrong.
2.3 Obstacle Layer or Voxel Layer
This layer ingests sensor observations such as lidar or depth data.
Its job is to:
- mark obstacle cells when hits are observed
- clear obstacle cells when free space is observed
- fuse dynamic environment evidence into the costmap
Key configuration ideas:
- marking vs clearing
- observation source names and topics
- obstacle height handling for 3D sensors
- persistence and buffering behavior
AMR reality: many phantom-obstacle incidents are obstacle-layer configuration problems, not planner problems.
2.4 Inflation Layer
Inflation does not create obstacles. It expands their influence.
Why it exists:
- keep path planning away from walls and pallet corners
- account for localization uncertainty and control error
- encourage safer path margins
Why it causes trouble:
- too much inflation makes narrow but legal aisles look impossible
- too little inflation leads to aggressive paths that are hard to track safely
Inflation is one of the most operationally sensitive Nav2 settings for warehouse robots.
2.5 Costmap Filters and Semantic Layers
Modern AMR deployments often need semantics beyond raw occupancy.
Examples:
- keepout zones around forbidden areas
- speed zones near humans or workcells
- restricted approach zones near docks
These are not classical obstacle layers, but they shape motion policy through the costmap representation.
This matters because product requirements often enter navigation through map semantics, not through planner code.
The robot footprint tells Nav2 how much space the robot occupies.
If it is wrong, the rest of the stack becomes misleading:
- too small -> planner/controller accept unsafe routes
- too large -> planner says no path in routes the physical robot could take
In warehouses, footprint accuracy matters because tolerances are often tight relative to aisle width.
3.2 Aisle Passability Is Not Just Geometry
Whether a robot can pass an aisle depends on the combination of:
- map resolution
- footprint dimensions
- inflation radius
- localization uncertainty
- controller tracking quality
That means two robots using the same map can have different passability outcomes under different tuning.
3.3 Why an Apparently Open Corridor Can Still Produce “No Path”
Typical reasons:
- inflated obstacles overlap after accounting for footprint
- unknown space blocks planning under current policy
- stale obstacle marks remain from sensor history
- localization places the robot or goal slightly inside inflated/lethal space
- map resolution plus footprint discretization closes a narrow gap
This is one of the most common senior-level Nav2 debugging questions because the answer requires costmap reasoning, not intuition from looking at the physical aisle.
PART 4 — OBSERVATION SOURCES AND STALE WORLD MODELS
4.1 Marking and Clearing Must Both Work
A sensor layer is only useful if it can both detect new obstacles and remove old ones when space becomes free.
If marking works but clearing fails:
- pallets linger in the costmap after removal
- robot sees ghost walls
- planner repeatedly fails or routes around empty space
If clearing is too aggressive:
- real obstacles disappear too early
- controller behaves overconfidently
This is a balancing problem, not a binary one.
4.2 Observation Timing Matters
Even with correct geometry, bad timing can poison the world model.
Examples:
- lidar data arrives late relative to TF
- sensor topic stalls under network load
- transform tolerance is too small for actual latency
- bag replay uses time settings that make costmap updates appear flaky
When timing goes wrong, the costmap can be syntactically valid and operationally misleading.
4.3 Rolling Window Behavior in the Local Costmap
The local costmap usually moves with the robot.
That makes it excellent for:
- nearby obstacle reaction
- tight maneuvering
- controller safety margin decisions
But it also means:
- stale local obstacles can follow the robot’s working space for a while
- short observation horizons may miss upcoming geometry if sensor placement is poor
- tuning decisions affect how early the controller reacts near corners and aisle entries
PART 5 — UNKNOWN SPACE POLICY
5.1 Unknown Does Not Mean the Same Thing Everywhere
Unknown cells can be treated differently depending on planner and configuration.
In some deployments, unknown space is effectively forbidden.
In others, it is traversable but risky.
This policy depends on environment type.
Structured warehouse with fixed navigation lanes
Conservative unknown policy is common because the robot should operate mostly in mapped space.
Semi-structured industrial floor with evolving layouts
Some tolerance for unknown exploration may be necessary, though often outside standard AMR production behavior.
5.2 Why Unknown-Space Decisions Affect Throughput
Overly conservative unknown handling can:
- reject valid temporary routes
- increase deadlock frequency near partially observed areas
- cause unnecessary human intervention
Overly permissive handling can:
- send robots through poorly understood space
- create unacceptable safety or predictability problems
This is a product and operations decision, not just a technical parameter choice.
PART 6 — AMR FAILURE MODES ROOTED IN COSTMAPS
6.1 Phantom Pallet Problem
The costmap still shows a pallet after the pallet is gone.
Likely causes:
- clearing rays not configured correctly
- observation persistence too long
- sensor blind spot preventing clearing evidence
- TF/timestamp mismatch making clearing invalid
Visible symptom:
- planner says no path or takes weird detours through obviously free space
6.2 Overinflated Corridor Problem
Two rack edges plus inflation plus footprint leave no legal free channel in the grid.
Visible symptom:
- corridor looks passable to humans
- planner refuses route
- operators blame map or planner without checking inflation math
6.3 Dynamic Obstacle Chatter
Humans or forklifts create rapid obstacle changes near the robot.
Visible symptoms:
- controller repeatedly slows and resumes
- robot appears indecisive at intersections
- BT recovery may trigger because local execution cannot make stable progress
This is often a local costmap behavior problem combined with controller policy.
If the footprint is copied from a CAD bounding box without considering safety margins, sensor offsets, or control accuracy, the costmap may consistently misclassify viability.
Result:
- some aisles become impossible in software
- other near-collision routes look legal
This is why footprint tuning deserves real validation, not guesswork.
PART 7 — HOW TO TUNE AND REVIEW COSTMAPS
7.1 Practical Tuning Order
Start with this order:
- verify map correctness
- verify footprint accuracy
- verify obstacle source marking and clearing
- tune inflation for safe but passable margins
- review unknown-space policy
- add semantic zones only after the base world model is trustworthy
This order prevents a common anti-pattern: compensating for bad obstacle data with semantic or planner changes.
7.2 Review Checklist
- Does the costmap match the operating environment closely enough?
- Are obstacle sources timely and correctly framed?
- Is the footprint realistic for the actual AMR body and control behavior?
- Does inflation reflect safety needs without collapsing aisle passability?
- Are keepout and speed zones used for product policy rather than to hide map defects?
7.3 What You Should Be Able to Explain After This Lesson
You should now be able to explain:
- why global and local costmaps serve different roles
- how layers combine into the final world model
- why footprint and inflation settings strongly affect route feasibility
- how stale or mistimed observations create phantom navigation failures
- why many planner complaints are really costmap complaints upstream
7.4 Next Step
Continue to 05 — Nav2 Global Planning.
That lesson builds on the costmap model and explains planner selection, tradeoffs, and failure patterns for production AMRs.