Prerequisite: 03-measurement-models.md (line constraints), 05-failure-modes.md (failure taxonomy)
Source files: estimator_reflectance_bell.cpp (268 lines), ceres_solver.cpp (~200 lines)
Unlocks: Diagnosing cold-start REJECT, geometric parking failures, hardware-vs-algorithmic triage
Ticket #104718 (amr-robot-scl-site-a105, 2026-04-22) revealed a failure mode that the existing learning material didn’t cover: a robot parked at a tape intersection caused all 4 line-sensors to REJECT simultaneously on cold-start, with no hardware fault whatsoever.
The investigation required understanding the Ceres solver internals deeply enough to: - Explain why 3 sensors crossing tape triggers REJECT even with Cauchy loss active - Explain why the robot can never self-recover once it enters this state - Distinguish this geometric failure from the stale-data cascade pattern already documented
This note captures all of that so future investigations can reach the same conclusions in minutes.
The line-sensor estimator fits a bell curve (actually a super-Gaussian trough) to the reflectance profile:
y(x) = p0 + p1 × exp(-|p2 × (x - p3)|^p4)
| Parameter | Meaning | Init value | Notes |
|---|---|---|---|
p0 |
Baseline (floor ADC) | ~960 | High because black floor reflects little IR |
p1 |
Amplitude | Negative (~-878) | Negative — curve is a downward trough, not a peak |
p2 |
Sharpness | 50.0 | Wider init → narrower initial expected tape width |
p3 |
Tape centre (m) | from previous | The value we want to extract for pose update |
p4 |
Shape exponent | 2.0 | p4=2 = Gaussian; p4>2 = super-Gaussian (flat-top, steeper sides) |
Critical: the curve is a TROUGH (dip), not a peak. - Tape (white) → low IR reflection → transistor saturates → LOW ADC values (~65–150) - Floor (black) → high IR reflection → transistor barely conducts → HIGH ADC values (~900–960)
This is the opposite of what you’d intuitively expect. The pull-down circuit in the line-sensor hardware inverts the response: more light → lower voltage → lower ADC reading.
ceres_solver.cpp)p1 ≤ 0.7 × initial_p1 (amplitude can't shrink more than 30%)
p2 ≤ 100.0 (sharpness cap)
p4 ∈ [1.2, 8.0] (shape bounded between sub-Gaussian and steep super-Gaussian)
With a robot crossing a single tape line at moderate speed: - p3 converges to the tape centre position in metres - p4 gradually increases from 2.0 toward 3–5 as the solver finds a better shape - p2 adjusts to match actual tape width - Residuals for all 12 sensors are small; 0 sensors disabled
There are two completely different sensor disablement mechanisms in estimator_reflectance_bell.cpp.
Which path runs depends on the YAML config flag reflectance_use_loss_function.
reflectance_use_loss_function: true
loss_function_constant: 40.0
Disabled-sensor check uses a hardcoded constant:
constexpr double DISABLED_SENSOR_ERROR_PERCENTAGE = 0.25;
// Source: estimator_reflectance_bell.cpp
A sensor is disabled if:
|raw_residual| > |p1| × DISABLED_SENSOR_ERROR_PERCENTAGE
Where raw_residual = y_predicted(xi) - y_measured(xi), without applying the Cauchy loss
(the apply_loss_function = false flag is set during the disabled-check step).
With p1 ≈ -878 (amplitude):
threshold = 878 × 0.25 = 219.5 ADC units
This is the active path on all production robots. The reflectance_disabled_sensor_threshold: 0.15
YAML value is not used when reflectance_use_loss_function: true.
reflectance_use_loss_function: false
Uses _disabled_sensor_threshold = 0.15 from YAML. A sensor is disabled if its reflectance
value is below max_reflectance × 0.15. This was the original mechanism.
If you read the YAML and see reflectance_disabled_sensor_threshold: 0.15, do NOT assume that
is the active threshold. In loss-function mode, the threshold is hardcoded at 0.25.
The effective threshold ratio is 0.25 (not 0.15). This is 67% stricter — sensors are disabled only if their raw residual exceeds 25% of the fitted amplitude, not 15% of max reflectance.
During optimization, Ceres applies CauchyLoss(c=40.0):
ρ(r²) = c² × log(1 + r²/c²)
The effect on gradient weight:
Cauchy weight = c² / (c² + r²) = 1 / (1 + (r/c)²)
At residual r = 254 (a sensor 3 sensor-positions off the tape centre):
weight = 1 / (1 + (254/40)²) = 1 / (1 + 40.3) ≈ 0.024
That sensor contributes only 2.4% of its normal gradient influence. The optimizer effectively ignores sensors with large residuals.
On the initial Gaussian guess (p2=50, p4=2.0) with 3 sensors crossing tape:
|raw_residual| > 219.5 → all 3 disableddisabled_sensors_count = 3 > max_disabled_sensor_count = 2 → REJECTThe Cauchy loss intended to make the solver robust to outliers is working correctly — but the “outliers” here are the tape sensors, which are physically the most important ones.
Verification via Python IRLS simulation showed that for a robot moving over a single tape line:
Step 0: p2=50.0 p4=2.00 → 3 disabled → REJECT (initial Gaussian too narrow)
Step 1: p2=63.1 p4=2.62 → 0 disabled → VALID ← ONE step escapes!
Step 12: p2=72.1 p4=6.72 → 0 disabled → converged to super-Gaussian
The solver CAN escape in step 1 because after the first Gauss-Newton update, p2 and p4 adjust enough that the initial REJECT sensors now fit within the threshold.
The key difference: when moving, _previous_solutions has a valid cached solution. When
cold-starting on a tape intersection, there is no cached solution and the initial guess is
always the narrow Gaussian.
This is the core mechanism of the #104718 failure.
_previous_solutions cacheIn estimator_reflectance_bell.cpp:
std::map<std::string, Solution> _previous_solutions;
// Maps bar frame_id → last valid solution
When a solution is VALID: the solution is stored in _previous_solutions[frame_id].
Next call: initialization uses cached p3 (tape position). The solver starts close to the
answer and converges quickly.
When a solution is REJECT: the solution is NOT stored. The map entry is NOT updated. Next call: if the map has no entry, initialization falls back to the narrow Gaussian default.
[Robot restarts on tape intersection]
│
▼
Call solver → no cached solution → init with narrow Gaussian (p2=50, p4=2)
│
▼
3 tape sensors get large residuals at step 0 → disabled → REJECT
│
▼
REJECT → do NOT store solution → _previous_solutions still empty
│
└──────────────────────────────────────────────────────────────────► (loop forever)
The robot can NEVER self-recover from this state. Every call to the solver starts from the same narrow Gaussian and produces the same REJECT. The only exit is: - Robot moves (changes sensor geometry) - Manual intervention / restart with robot on clean floor
The tape intersection geometry at tile (28,0,0) places 3 sensors from EACH bar on white tape.
With max_disabled_sensor_count = 2 (designed for one broken sensor), 3 sensors on tape always
exceeds the limit. All 4 bars → REJECT simultaneously. The system has no fall-back.
| Bar | S00 | S01 | S02 | S03 | S04 | S05 | S06 | S07 | S08 | S09 | S10 | S11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| front | 959 | 930 | 898 | 569 | 71 | 66 | 76 | 635 | 878 | 924 | 944 | 956 |
| rear | 999 | 972 | 992 | 719 | 142 | 59 | 63 | 252 | 859 | 940 | 995 | 958 |
| left | 948 | 931 | 913 | 834 | 560 | 69 | 65 | 75 | 580 | 889 | 936 | 956 |
| right | 940 | 836 | 499 | 67 | 65 | 71 | 512 | 846 | 890 | 883 | 918 | 931 |
Bold = on white tape (ADC 59–142, well below 219.5 threshold)
Each bar has exactly 3 sensors on tape → exceeds max_disabled_sensor_count=2 → all 4 REJECT.
is_reliable dropout rate: 0.02% (175,487 frames/bar) — nominalConclusion: HARDWARE IS HEALTHY. Failure is entirely geometric + algorithmic.
| Time (UTC) | Event |
|---|---|
| ≤03:46 | Robot already parked at tile (28,0,0) on tape intersection |
| 05:21:29 | All nodes kill (roslaunch restart — someone triggered a restart) |
| 05:21:41 | EstimatorReflectanceBell: Curve fitting disabled sensors for line-sensor: 4 (max: 2) |
| 05:21:46 | Line-sensor calibration: 4/4 bars HEALTHY (not the issue) |
| 05:21:50 | Map loads, tile (28,0,0) already in state=3 (disabled) |
| 05:22:00 | InitializeProcessing::recoverErrors: Recovering error 1 by MCLBC_Navigation |
| 05:22:06.916 | RequestManualAssistance — MCLBC_Navigation code 1 ⛔ |
Use this checklist when you see MCLBC_Navigation code 1 or EstimatorReflectanceBell: Curve fitting disabled sensors for line-sensor: N (max: 2):
□ Does /rosout show "Curve fitting disabled sensors" for ALL 4 bars simultaneously?
→ If yes: suspect tape intersection geometry (not hardware)
→ If only 1-2 bars: more likely single line-sensor hardware degradation
□ Was there a recent restart/reboot before the failure?
→ Cold-start required to trigger this pattern (no cached solution)
□ Check the actual sensor values from /robot<N>/line-sensors at failure time
→ If 3+ sensors have values < 200 per bar → robot is on tape intersection
→ If random sensors have values < 200 → could be sensor hardware fault
□ Run line-sensor health check (is_reliable dropout, stuck sensors)
→ If dropout < 1% and no stuck sensors → hardware healthy → geometric failure
□ Check tile position from /robot<N>/localization or RFID
→ Tile intersections are at map grid cross-points
→ Compare tile position against warehouse grid map
| Signal | Tape Intersection | Stale Data Cascade | Hardware Fault |
|---|---|---|---|
| All 4 bars REJECT | Yes | Not usually | Sometimes |
| Failure on cold-start | Yes | No (occurs during motion) | No pattern |
| Hardware checks pass | Yes | Yes | No |
| Robot was stationary | Yes | No (requires motion) | Either |
| Sensor values | 3 sensors ~65–150 ADC | Normal pattern | Stuck/noisy |
is_reliable dropout |
Normal | Normal | Elevated (>5%) |
Immediate: Manually reposition robot 20-30cm off the intersection. Clear the tile disable.
Long-term options:
1. Increase reflectance_max_disabled_sensor_count from 2 to 3 (allows 3 sensors on tape)
2. Add warm-start fallback: if REJECT and no previous solution, try a wider initial p2 (e.g., p2=20)
3. Add parking-position constraint to prevent robots from stopping at tape intersections
4. Increase DISABLED_SENSOR_ERROR_PERCENTAGE from 0.25 to allow more sensor variation
Tradeoff note: Option 1 and 4 reduce sensitivity to actual broken sensors. The threshold was designed for detecting 1 broken sensor (noise floor, not geometry). Changing it affects all robots at all sites.
From config_default.yaml:
sensor_spacing_x: 0.0069 # 6.9mm between adjacent sensors (12 sensors = 82.8mm span)
reflectance_use_loss_function: true # ENABLES PATH A (loss function mode)
loss_function_constant: 40.0 # c parameter in CauchyLoss(c)
weight_reduction_error_threshold: 20.0 # Weight reduction starts above this residual (not DISABLED)
reflectance_disabled_sensor_threshold: 0.15 # NOT USED when loss_function=true
reflectance_max_disabled_sensor_count: 2 # REJECT if more than this many sensors disabled
Hardcoded in source (NOT configurable via YAML):
constexpr double DISABLED_SENSOR_ERROR_PERCENTAGE = 0.25; // estimator_reflectance_bell.cpp
| Topic | Key fact |
|---|---|
| Bell curve | Trough shape; p1 is negative; LOW ADC = white tape |
| Active disable threshold | 25% of fitted amplitude (~219 ADC) — NOT the 0.15 YAML value |
| Cauchy C=40 | Down-weights sensors with residual >40 ADC; creates trap for tape sensors |
| Cold-start failure | No cached solution → narrow Gaussian → tape sensors at large residual → REJECT |
| Self-perpetuating loop | REJECT doesn’t update cache → next call identical → REJECT forever |
| Tape intersection | 3 sensors/bar on tape at grid cross-points → always > max_disabled=2 |
| Hardware check | 0% stuck, 0.02% unreliable → failure is geometric, not hardware |
| Fix | Move robot off intersection OR increase max_disabled OR add warm-start fallback |