← Back to Python Scripting

01 — Type Annotations, Dataclasses, and Pydantic

Stop chasing runtime AttributeErrors — catch them before the script runs

Prerequisite: Python functions, dicts, classes. You’ve written scripts with argparse before. Unlocks: mypy static analysis, validated config loading, self-documenting function signatures, Pydantic models


Why Should I Care? (Context)

Analysis scripts that process large log files or call external APIs contain dozens of implicit assumptions: - “This function always returns a list, never None” - “The config dict definitely has a confidence key” - “Ticket ID is an int, robot ID is a string — don’t mix them up”

Without type annotations, every one of those assumptions is invisible until the script crashes at 2 AM on real data. With type annotations + mypy: - A mismatched function argument is a red underline in the editor, not a runtime traceback - Pydantic turns your .env file into a validated, typed Python object — missing API_TOKEN? Crash at startup with a clear error, not 40 lines into processing - NewType makes ticket_id and robot_id incompatible types — the compiler prevents passing one where the other is expected

This chapter is a force multiplier for every script in this repo.


PART 1 — PYTHON TYPE ANNOTATIONS BASICS


1.1 The Syntax

Type annotations are hints — they don’t change runtime behavior, but tools like mypy and your editor use them to catch errors before the code runs.

# Variable annotations
count: int = 0
name: str = "default"
ratio: float = 0.95
active: bool = True
result: None = None          # rarely useful as a variable annotation

# Function annotations
def greet(name: str) -> str:
    return f"Hello, {name}"

def process_log(path: str, verbose: bool = False) -> int:
    """Returns number of lines processed."""
    ...
    return 0

def log_event(message: str) -> None:   # -> None = does not return a value
    print(message)

Key insight: -> None is not the same as omitting the return type. Omitting means “I haven’t annotated this yet.” -> None says “I explicitly guarantee this returns nothing.”


1.2 Collection Types (Python 3.9+)

Python 3.9 added built-in generic types — you no longer need to import from typing:

# Python 3.9+ (preferred)
scores: list[float] = [0.8, 0.9, 0.5]
lookup: dict[str, int] = {"alpha": 1, "beta": 2}
coords: tuple[float, float] = (1.0, 2.0)
ids: set[int] = {1, 2, 3}

# Python 3.8 and earlier (still works, just verbose)
from typing import List, Dict, Tuple, Set
scores: List[float] = []
lookup: Dict[str, int] = {}

Fixed-length tuples are annotated with each element’s type:

Point3D = tuple[float, float, float]        # exactly 3 floats
LogLine = tuple[float, str, str]            # (timestamp, level, message)

def parse_log_line(raw: str) -> LogLine:
    parts = raw.split("|", 2)
    return float(parts[0]), parts[1], parts[2]

Variable-length tuples of uniform type use ...:

Numbers = tuple[int, ...]    # any number of ints

1.3 Optional and Union

from typing import Optional, Union

# Optional[X] means "X or None" — equivalent to X | None (Python 3.10+)
def find_session(session_id: str) -> Optional[dict]:
    ...

# Python 3.10+ syntax (cleaner)
def find_session(session_id: str) -> dict | None:
    ...

# Union: accept multiple types
def format_id(value: Union[int, str]) -> str:
    return str(value)

# Python 3.10+
def format_id(value: int | str) -> str:
    return str(value)

Common mistake: returning None from a function annotated -> str is a type error. Always annotate with str | None if None is possible.

# BAD — mypy will flag this
def get_error_code(log: str) -> str:
    if "ERROR" in log:
        return log.split("ERROR:")[1].strip()
    return None  # Type error: None is not str

# GOOD
def get_error_code(log: str) -> str | None:
    if "ERROR" in log:
        return log.split("ERROR:")[1].strip()
    return None

1.4 Worked Example: Annotating a Log Parser

from pathlib import Path


def parse_incident_log(
    path: Path,
    error_only: bool = False,
    max_lines: int | None = None,
) -> list[tuple[float, str, str]]:
    """
    Parse a structured log file into (timestamp, level, message) tuples.

    Args:
        path: Path to the log file.
        error_only: If True, only return ERROR-level entries.
        max_lines: Stop after this many lines (None = read all).

    Returns:
        List of (timestamp_seconds, level, message) tuples.
    """
    entries: list[tuple[float, str, str]] = []

    with path.open() as f:
        for i, line in enumerate(f):
            if max_lines is not None and i >= max_lines:
                break

            line = line.strip()
            if not line:
                continue

            parts = line.split("|", 2)
            if len(parts) != 3:
                continue

            ts_str, level, message = parts
            if error_only and level.strip() != "ERROR":
                continue

            entries.append((float(ts_str), level.strip(), message.strip()))

    return entries


# The return type makes it clear: callers know they get a list of 3-tuples.
# Without the annotation, they'd have to read the whole function to know.
results: list[tuple[float, str, str]] = parse_incident_log(
    Path("session.log"),
    error_only=True,
    max_lines=1000,
)

for ts, level, msg in results:
    print(f"[{ts:.3f}] {level}: {msg}")

PART 2 — ADVANCED TYPES


2.1 TypedDict: Typed Dictionary Keys

Plain dict[str, Any] loses all type information about what keys exist. TypedDict fixes this:

from typing import TypedDict, Any


class IncidentSummary(TypedDict):
    ticket_id: int
    robot_id: str
    error_code: str
    duration_s: float


class SearchResult(TypedDict):
    query: str
    matches: list[dict[str, Any]]
    confidence: float
    source: str


def summarise_incident(raw: dict) -> IncidentSummary:
    return IncidentSummary(
        ticket_id=int(raw["id"]),
        robot_id=str(raw["robot"]),
        error_code=str(raw.get("code", "UNKNOWN")),
        duration_s=float(raw.get("duration", 0.0)),
    )


# Now the editor knows summary["ticket_id"] is an int
summary = summarise_incident({"id": "123", "robot": "bot-01", "code": "NAV_ERR"})
print(summary["ticket_id"] + 1)   # OK — int + int
print(summary["robot_id"] + 1)   # mypy ERROR: str + int

TypedDict also supports optional keys via total=False:

class PartialResult(TypedDict, total=False):
    confidence: float    # this key may or may not be present
    source: str


class RequiredResult(TypedDict):
    query: str


class FullResult(RequiredResult, PartialResult):
    pass   # query is required, confidence/source are optional

2.2 Protocol: Structural Subtyping

A Protocol defines an interface by behaviour, not inheritance. Any class that has the right methods satisfies the protocol — no need to explicitly inherit.

from typing import Protocol


class Searchable(Protocol):
    def search(self, query: str) -> list[str]: ...


class Serialisable(Protocol):
    def to_dict(self) -> dict: ...
    def from_dict(cls, data: dict) -> "Serialisable": ...


# Any class with a .search() method satisfies Searchable
class KbSearcher:
    def search(self, query: str) -> list[str]:
        return []    # real impl goes here


class MemorySearcher:
    def search(self, query: str) -> list[str]:
        return []    # different backend, same interface


def run_search(backend: Searchable, query: str) -> list[str]:
    return backend.search(query)


# Both work — no inheritance required
run_search(KbSearcher(), "motor stall")
run_search(MemorySearcher(), "slip event")

Key insight: Protocol lets you write generic functions that work with any object satisfying a contract. This is especially useful for dependency injection in tests — replace the real searcher with a fake one.


2.3 Callable, Literal, Final, ClassVar

from typing import Callable, Literal, Final, ClassVar


# Callable[[ArgTypes...], ReturnType]
Scorer = Callable[[str, list[str]], float]

def apply_scorer(scorer: Scorer, query: str, candidates: list[str]) -> float:
    return scorer(query, candidates)


# Literal: restrict to specific values
LogLevel = Literal["DEBUG", "INFO", "WARN", "ERROR"]

def log(level: LogLevel, msg: str) -> None:
    print(f"[{level}] {msg}")

log("INFO", "started")       # OK
log("VERBOSE", "too much")   # mypy ERROR: "VERBOSE" not in Literal


# Final: a constant that cannot be reassigned
MAX_RETRIES: Final[int] = 3
KB_VERSION: Final[str] = "2.0"

MAX_RETRIES = 4   # mypy ERROR: Cannot assign to final name


# ClassVar: a class-level variable (not per-instance)
class SearchConfig:
    DEFAULT_LIMIT: ClassVar[int] = 20
    MIN_CONFIDENCE: ClassVar[float] = 0.5

    def __init__(self, limit: int = SearchConfig.DEFAULT_LIMIT) -> None:
        self.limit = limit

2.4 Type Aliases and NewType

from typing import NewType


# Type alias: just a shorthand name
LogLine = tuple[float, str, str]
ScoreMap = dict[str, float]
Findings = list[dict[str, str]]

def score_results(lines: list[LogLine]) -> ScoreMap:
    ...


# NewType: create a *distinct* type that cannot be confused with its base
TicketId = NewType("TicketId", int)
RobotId = NewType("RobotId", str)

def fetch_ticket(ticket_id: TicketId) -> dict:
    ...

def fetch_robot_logs(robot_id: RobotId) -> list[str]:
    ...


ticket = TicketId(12345)
robot = RobotId("bot-07")

fetch_ticket(ticket)          # OK
fetch_ticket(robot)           # mypy ERROR: RobotId is not TicketId
fetch_ticket(12345)           # mypy ERROR: plain int is not TicketId (strict mode)

# Create NewType values explicitly
my_ticket = TicketId(67890)
my_robot = RobotId("bot-12")

Key insight: NewType costs nothing at runtime (it’s the identity function) but prevents entire classes of mix-up bugs. Use it whenever two values have the same underlying type but different meanings — IDs are the classic case.


PART 3 — @DATACLASS


3.1 Basics

@dataclass auto-generates __init__, __repr__, and __eq__ from your field annotations.

from dataclasses import dataclass, field
from datetime import datetime


@dataclass
class Hypothesis:
    description: str
    confidence: float
    evidence: list[str]

# Generated __init__:
# def __init__(self, description: str, confidence: float, evidence: list[str])

h = Hypothesis("slip due to surface change", 0.8, ["velocity drop at t=12.3"])
print(h)
# Hypothesis(description='slip due to surface change', confidence=0.8,
#            evidence=['velocity drop at t=12.3'])
print(h == Hypothesis("slip due to surface change", 0.8, ["velocity drop at t=12.3"]))
# True — __eq__ compares all fields

3.2 field() — Defaults and Mutable Defaults

from dataclasses import dataclass, field


@dataclass
class SearchRequest:
    query: str
    limit: int = 20                         # simple default
    sources: list[str] = field(            # mutable default MUST use field()
        default_factory=lambda: ["kb", "memory"]
    )
    metadata: dict[str, str] = field(
        default_factory=dict
    )
    created_at: datetime = field(
        default_factory=datetime.utcnow
    )

# BAD — will raise ValueError at class definition time:
# @dataclass
# class Bad:
#     items: list[str] = []   # Python: mutable default is dangerous

r1 = SearchRequest("motor stall")
r2 = SearchRequest("slip event")
r1.sources.append("ado")

print(r1.sources)  # ["kb", "memory", "ado"]
print(r2.sources)  # ["kb", "memory"] — r2 has its own list

3.3 frozen=True: Immutable Dataclasses

from dataclasses import dataclass


@dataclass(frozen=True)
class EventKey:
    """Immutable key identifying a specific event in a log."""
    session_id: str
    timestamp: float
    event_type: str

    def before(self, other: "EventKey") -> bool:
        return self.timestamp < other.timestamp


key = EventKey("sess-001", 1714300000.0, "NAV_ERR")
key.timestamp = 0.0   # raises FrozenInstanceError

# frozen dataclasses are hashable — can be used in sets and dict keys
seen: set[EventKey] = set()
seen.add(key)
cache: dict[EventKey, str] = {key: "delocalized"}

3.4 post_init: Computed Fields

from dataclasses import dataclass, field
from datetime import datetime


@dataclass
class IncidentWindow:
    start_ts: float
    end_ts: float
    _duration_s: float = field(init=False)   # computed, not a constructor arg

    def __post_init__(self) -> None:
        if self.end_ts < self.start_ts:
            raise ValueError(
                f"end_ts ({self.end_ts}) must be >= start_ts ({self.start_ts})"
            )
        object.__setattr__(self, "_duration_s", self.end_ts - self.start_ts)

    @property
    def duration_s(self) -> float:
        return self._duration_s

    @property
    def duration_min(self) -> float:
        return self._duration_s / 60.0


window = IncidentWindow(start_ts=1714300000.0, end_ts=1714300300.0)
print(window.duration_min)   # 5.0

3.5 @dataclass vs NamedTuple

from typing import NamedTuple


# NamedTuple: immutable, supports positional access and unpacking
class LogEntry(NamedTuple):
    timestamp: float
    level: str
    message: str

entry = LogEntry(1714300000.0, "ERROR", "localization failed")
ts, level, msg = entry      # unpacking works
print(entry[0])             # positional access works
print(entry.timestamp)      # named access works

# dataclass: mutable by default, better for complex entities
@dataclass
class SessionState:
    session_id: str
    phase: str = "orient"
    hypotheses: list[str] = field(default_factory=list)
    # ... many more fields

state = SessionState("sess-001")
state.phase = "analyse"    # mutation is natural
state.hypotheses.append("slip event at t=12.3")
NamedTuple @dataclass
Immutable Yes (always) Only with frozen=True
Positional access Yes (x[0]) No
Unpacking Yes No
Default values Yes Yes
Mutable defaults No (workaround needed) Yes via field()
Inheritance Limited Full
Best for Small value objects (points, keys) Entities with many fields

3.6 Worked Example: SessionState

from dataclasses import dataclass, field
from datetime import datetime
from typing import Literal


Phase = Literal["orient", "scope", "analyse", "hypothesise", "close"]


@dataclass
class Finding:
    description: str
    confidence: float
    evidence: list[str] = field(default_factory=list)
    tags: list[str] = field(default_factory=list)


@dataclass
class SessionState:
    """Tracks progress of a root-cause analysis session."""
    session_id: str
    ticket_id: int
    title: str
    phase: Phase = "orient"
    hypotheses: list[str] = field(default_factory=list)
    findings: list[Finding] = field(default_factory=list)
    started_at: datetime = field(default_factory=datetime.utcnow)
    closed_at: datetime | None = None
    notes: dict[str, str] = field(default_factory=dict)

    def advance_phase(self, next_phase: Phase) -> None:
        self.phase = next_phase

    def add_hypothesis(self, h: str) -> None:
        if h not in self.hypotheses:
            self.hypotheses.append(h)

    def add_finding(self, description: str, confidence: float,
                   evidence: list[str] | None = None) -> Finding:
        f = Finding(
            description=description,
            confidence=confidence,
            evidence=evidence or [],
        )
        self.findings.append(f)
        return f

    def close(self) -> None:
        self.phase = "close"
        self.closed_at = datetime.utcnow()

    def to_dict(self) -> dict:
        return {
            "session_id": self.session_id,
            "ticket_id": self.ticket_id,
            "phase": self.phase,
            "hypotheses": self.hypotheses,
            "findings": [
                {
                    "description": f.description,
                    "confidence": f.confidence,
                    "evidence": f.evidence,
                }
                for f in self.findings
            ],
        }


# Usage
state = SessionState(session_id="sess-042", ticket_id=99999, title="robot stopped mid-run")
state.advance_phase("scope")
state.add_hypothesis("localization diverged due to featureless corridor")
f = state.add_finding(
    "covariance xx exceeded threshold at t=12.3s",
    confidence=0.87,
    evidence=["cov_xx=0.018 > limit=0.010", "vel dropped from 0.8 to 0.0 in 200ms"],
)

PART 4 — PYDANTIC BASEMODEL


4.1 Why Pydantic?

@dataclass is great for pure Python objects. Pydantic goes further: - Runtime validation: invalid types raise ValidationError, not silent corruption - Type coercion: "42"42, "true"True - Serialisation: .model_dump() → dict, .model_dump_json() → JSON string - Parsing: .model_validate(raw_dict) validates a dict from an API or file

from pydantic import BaseModel, Field
from typing import Any, Literal


class SearchResult(BaseModel):
    query: str
    matches: list[dict[str, Any]]
    confidence: float = Field(ge=0.0, le=1.0, description="0–1 match confidence")
    source: Literal["kb", "ado", "memory"] = "kb"


# Pydantic coerces types and validates constraints
result = SearchResult(
    query="motor stall",
    matches=[{"title": "motor fault pattern", "score": 0.91}],
    confidence=0.91,
)

print(result.confidence)            # 0.91
print(result.model_dump())          # dict with all fields
print(result.model_dump_json())     # JSON string

# Validation failure — clear error, not a silent bug
try:
    bad = SearchResult(query="test", matches=[], confidence=1.5)
except Exception as e:
    print(e)
    # confidence: Input should be less than or equal to 1

4.2 Field() — Constraints, Defaults, Aliases

from pydantic import BaseModel, Field
import re


class IncidentReport(BaseModel):
    ticket_id: int = Field(gt=0, description="Positive ticket identifier")
    title: str = Field(min_length=5, max_length=200)
    confidence: float = Field(ge=0.0, le=1.0, default=0.0)
    tags: list[str] = Field(default_factory=list, max_length=20)
    summary: str | None = Field(default=None, alias="exec_summary")

    # alias: when the source dict uses a different key name
    # model_validate({"exec_summary": "..."}) → summary = "..."

    model_config = {"populate_by_name": True}   # allow both alias and field name


report = IncidentReport(
    ticket_id=99999,
    title="Robot stopped at turn",
    confidence=0.75,
    exec_summary="Slip event at waypoint 12",
)
print(report.summary)  # "Slip event at waypoint 12"

4.3 Validators

Pydantic v2 uses @field_validator for per-field validation:

from pydantic import BaseModel, Field, field_validator


class KbSearchRequest(BaseModel):
    query: str
    limit: int = Field(default=10, ge=1, le=100)
    min_confidence: float = Field(default=0.5, ge=0.0, le=1.0)

    @field_validator("query")
    @classmethod
    def query_not_empty(cls, v: str) -> str:
        v = v.strip()
        if not v:
            raise ValueError("query must not be empty after stripping whitespace")
        if len(v) < 3:
            raise ValueError(f"query too short ({len(v)} chars); minimum 3")
        return v

    @field_validator("limit")
    @classmethod
    def limit_reasonable(cls, v: int) -> int:
        if v > 50:
            import warnings
            warnings.warn(f"limit={v} is large; consider reducing for performance")
        return v

4.4 model_validate(), model_dump(), Nested Models

from pydantic import BaseModel
from typing import Any


class Evidence(BaseModel):
    description: str
    timestamp_s: float | None = None
    data: dict[str, Any] = {}


class RcaFinding(BaseModel):
    root_cause: str
    confidence: float
    evidence: list[Evidence] = []
    recommended_fix: str | None = None


# Parse from dict (e.g., loaded from a JSON file)
raw = {
    "root_cause": "localization diverged due to reflective floor",
    "confidence": 0.88,
    "evidence": [
        {"description": "cov_xx spike", "timestamp_s": 1714300012.3},
        {"description": "velocity drop to zero"},
    ],
}
finding = RcaFinding.model_validate(raw)
print(finding.evidence[0].timestamp_s)  # 1714300012.3

# Serialize back to dict
d = finding.model_dump()
print(d)

# Serialize to JSON
json_str = finding.model_dump_json(indent=2)

# Auto-generate JSON schema
schema = RcaFinding.model_json_schema()
print(schema)

PART 5 — PYDANTIC SETTINGS (CONFIG FROM .env)


5.1 BaseSettings

pydantic-settings extends Pydantic to read values from environment variables and .env files automatically:

pip install pydantic-settings
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field


class Settings(BaseSettings):
    # Required — will raise ValidationError if not set in env or .env
    api_token: str

    # Optional — have defaults
    chat_token: str | None = None
    grafana_url: str = "http://localhost:3000"
    kb_min_confidence: float = 0.7
    max_search_results: int = 20
    debug: bool = False

    # Field with alias: env var ADO_ORG maps to .ado_org
    ado_org: str = Field(default="my-org", alias="ADO_ORG")

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,      # API_TOKEN and api_token both work
        extra="ignore",            # ignore unknown env vars
    )


# Usage — typically at module level or in main()
def get_settings() -> Settings:
    """Singleton-style settings loader."""
    return Settings()   # raises ValidationError if API_TOKEN is missing

5.2 How it reads values

Priority order (highest wins): 1. Direct constructor argument: Settings(api_token="override") 2. Environment variable: export API_TOKEN=abc123 3. .env file: API_TOKEN=abc123 4. Field default

# .env file
# API_TOKEN=abc123
# CHAT_TOKEN=xoxb-...
# KB_MIN_CONFIDENCE=0.8
# DEBUG=true            ← "true" is coerced to True

settings = Settings()
print(settings.api_token)              # "abc123"
print(settings.kb_min_confidence)    # 0.8 (float, not string)
print(settings.debug)                # True (bool, not string)

5.3 Worked Example: Settings for an Analysis Tool

from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, field_validator
from pathlib import Path
import warnings


class AnalysisSettings(BaseSettings):
    """Settings for the incident analysis toolkit."""

    # Auth tokens
    api_token: str = Field(description="Azure DevOps Personal Access Token")
    chat_token: str | None = Field(default=None)

    # Service URLs
    grafana_url: str = "http://localhost:3000"
    monitoring_token: str | None = None
    ado_org: str = "my-org"
    ado_project: str = "my-project"

    # Search / KB settings
    kb_min_confidence: float = Field(default=0.7, ge=0.0, le=1.0)
    max_search_results: int = Field(default=20, ge=1, le=200)

    # Paths
    attachments_dir: Path = Path("attachments")
    kb_dir: Path = Path("scripts/kb")

    # Behaviour
    dry_run: bool = False
    verbose: bool = False

    @field_validator("grafana_url", "ado_org")
    @classmethod
    def not_empty(cls, v: str) -> str:
        if not v.strip():
            raise ValueError("must not be empty")
        return v.strip()

    @field_validator("attachments_dir", "kb_dir")
    @classmethod
    def dir_exists_warning(cls, v: Path) -> Path:
        if not v.exists():
            warnings.warn(f"Directory {v} does not exist yet")
        return v

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,
    )


# Fail fast at startup — better than failing 40 lines into processing
def load_settings() -> AnalysisSettings:
    try:
        return AnalysisSettings()
    except Exception as e:
        raise SystemExit(f"Configuration error: {e}") from e


# In main():
# settings = load_settings()
# print(f"KB confidence threshold: {settings.kb_min_confidence}")
# print(f"ADO org: {settings.ado_org}")

5.4 Testing Settings (Preview)

import os
import pytest
from unittest.mock import patch


def test_settings_loads_from_env(monkeypatch):
    monkeypatch.setenv("API_TOKEN", "test-token-123")
    monkeypatch.setenv("KB_MIN_CONFIDENCE", "0.9")

    settings = AnalysisSettings()
    assert settings.api_token == "test-token-123"
    assert settings.kb_min_confidence == 0.9


def test_settings_fails_without_api_token(monkeypatch):
    monkeypatch.delenv("API_TOKEN", raising=False)
    with pytest.raises(Exception):   # ValidationError
        AnalysisSettings()

PART 6 — mypy / pyright BASICS


6.1 Running mypy

# Install
pip install mypy

# Check a single file
mypy scripts/knowledge_search.py

# Strict mode — catches more (recommended for new files)
mypy --strict scripts/knowledge_search.py

# Check all scripts
mypy scripts/ --ignore-missing-imports

# With config file (mypy.ini or pyproject.toml)
mypy .

Example output:

scripts/knowledge_search.py:42: error: Argument 1 to "fetch_ticket" has incompatible
    type "str"; expected "TicketId"  [arg-type]
scripts/knowledge_search.py:67: error: Item "None" of "str | None" has no attribute
    "split"  [union-attr]

6.2 Common Errors and How to Fix Them

# Error: Item "None" of "str | None" has no attribute "split"
# Fix: guard against None before using the value
def process(value: str | None) -> list[str]:
    if value is None:
        return []
    return value.split(",")    # now mypy knows value is str


# Error: Argument 1 to "int" has incompatible type "str | int"
# Fix: cast or check type first
from typing import cast

def ensure_int(v: str | int) -> int:
    if isinstance(v, int):
        return v
    return int(v)


# Error: Function is missing a return type annotation
# Fix: add -> ReturnType
def compute(x: float) -> float:
    return x * 2.0


# Error: Need type annotation for ... (hint: use ...)
# Fix: annotate the variable
results: list[str] = []
lookup: dict[str, int] = {}

6.3 cast() and type: ignore

from typing import cast
import json

# cast(): tell mypy "trust me, this is X"
# Use sparingly — it's a lie that can hide real bugs
raw = json.loads('{"id": 1}')
# raw has type dict[str, Any]
ticket_id = cast(int, raw["id"])   # mypy now treats ticket_id as int


# type: ignore — suppress a specific error
# Include the error code so it's clear what you're ignoring
result = some_third_party_function()  # type: ignore[no-any-return]


# Better: use a proper annotation on the third-party call
from typing import Any
result2: Any = some_third_party_function()

6.4 mypy.ini / pyproject.toml Configuration

# mypy.ini
[mypy]
python_version = 3.11
warn_return_any = True
warn_unused_ignores = True
disallow_untyped_defs = True
ignore_missing_imports = True

[mypy-boto3.*]
ignore_missing_imports = True
# pyproject.toml
[tool.mypy]
python_version = "3.11"
warn_return_any = true
warn_unused_ignores = true
disallow_untyped_defs = true
ignore_missing_imports = true

6.5 Pre-commit Integration

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0
    hooks:
      - id: mypy
        args: [--strict, --ignore-missing-imports]
        additional_dependencies:
          - pydantic
          - pydantic-settings

Summary — What to Remember

Concept Rule
Optional[X] / X \| None Use whenever a function can return None or a parameter is optional
TypedDict Use instead of dict[str, Any] when you know the exact keys
Protocol Define interfaces without inheritance; great for testable dependency injection
NewType Create distinct ID types (TicketId, RobotId) to prevent mix-ups
@dataclass Auto-generates __init__/__repr__/__eq__; use field(default_factory=...) for mutable defaults
frozen=True Makes dataclass immutable and hashable; use for dict keys and set members
BaseModel (Pydantic) Runtime validation + coercion + serialisation; use for API responses and parsed data
BaseSettings Reads from env/.env, coerces types, fails fast at startup if required vars missing
Literal[...] Restricts a string param to specific allowed values
cast() / # type: ignore Use sparingly; always include the error code in ignore comments

QUICK REFERENCE CARD

┌──────────────────────────────── TYPE ANNOTATIONS CHEAT SHEET ───────────────────────────────┐
│                                                                                              │
│  BASIC:         x: int   y: str   z: float   b: bool   n: None                             │
│  COLLECTIONS:   list[int]   dict[str, float]   tuple[int, str]   set[str]                   │
│  OPTIONAL:      str | None    (or Optional[str] for 3.8)                                    │
│  UNION:         int | str     (or Union[int, str] for 3.8)                                  │
│                                                                                              │
│  FUNCTIONS:     def f(x: int, y: str = "a") -> float: ...                                  │
│                 def g() -> None: ...                                                         │
│                                                                                              │
│  ADVANCED:      TypedDict  — typed dict with known keys                                     │
│                 Protocol   — structural interface (duck typing)                              │
│                 NewType    — distinct type alias (zero runtime cost)                         │
│                 Literal    — value-restricted string/int                                     │
│                 Final      — constant, cannot be reassigned                                  │
│                 Callable[[A,B], R]  — function type                                          │
│                                                                                              │
│  DATACLASS:     @dataclass → auto __init__/__repr__/__eq__                                  │
│                 field(default_factory=list)  — mutable default                               │
│                 frozen=True  → immutable, hashable                                           │
│                 __post_init__  → computed fields after init                                  │
│                                                                                              │
│  PYDANTIC:      BaseModel → validated, serialisable model                                   │
│                 Field(ge=0, le=1)  → constraints                                             │
│                 .model_validate(dict) → parse from dict                                     │
│                 .model_dump()  → serialize to dict                                           │
│                 @field_validator → custom validation logic                                   │
│                                                                                              │
│  SETTINGS:      BaseSettings → reads .env + env vars automatically                          │
│                 SettingsConfigDict(env_file=".env")  → config                               │
│                 Settings() raises ValidationError if required field missing                  │
│                                                                                              │
│  mypy:          mypy --strict file.py                                                        │
│                 cast(Type, value)  — assert type (use sparingly)                             │
│                 # type: ignore[error-code]  — suppress specific error                        │
│                                                                                              │
└──────────────────────────────────────────────────────────────────────────────────────────────┘