Transfer Window Logic

Transfer window logic is the sub-problem inside the Fare Rule Validation & Calculation Engines pipeline that decides whether a rider’s second or third tap is a discounted continuation of one journey or the start of a new, separately-charged one. It sits immediately downstream of ingestion and immediately upstream of settlement: by the time a tap reaches the window evaluator it is already normalized and validated, and the decision the evaluator emits — free transfer, or fresh base fare — is what the ledger and every inter-agency payout are built on. For transit operations teams, revenue analysts, and the Python developers who own the engine, a window that fires inconsistently is not a rounding error; it silently leaks revenue on one tap and over-charges a rider on the next, and because the two failures net out in aggregate they can hide from dashboards for months. This page covers how to build that evaluator as a deterministic, memory-bounded stream processor rather than a set of heuristic time comparisons.

The scope here is the single-media, single-journey window: one card or account token, a sequence of taps, and the rule that grants a discounted or zero-fare transfer when the gap between taps falls inside an operator-defined threshold. The harder case — a transfer that crosses an operator boundary and triggers a settlement split — is handled in the dedicated guide on calculating cross-operator transfer windows with Python, which this component hands off to.

Architecture: Where the Window Evaluator Sits

A tap does not arrive at the window evaluator as raw hardware bytes. It has already been resolved to a canonical media_hash by Smart Card Schema Mapping and passed the structural gate enforced by the Schema Validation Pipelines. The evaluator’s only job is temporal: given an already-clean event and the open session (if any) for that media, decide whether the elapsed time clears the window and emit exactly one priced, auditable decision.

The timing logic below shows how a tap is resolved into a free transfer or a fresh base fare once it clears the validation gates:

The core eligibility test is a single inequality. Given an origin tap at time $t_{origin}$ , a candidate tap at $t_{tap}$ , a window length $W$ , and a grace period $G$ , the candidate is a transfer when:

\Delta t = t_{tap} - t_{origin} \le W + G

Everything else in this component exists to make that comparison reliable at scale: normalizing the clocks that produce $t_{tap}$ , bounding the memory that holds $t_{origin}$ , and guaranteeing the decision replays identically months later during a settlement dispute.

Prerequisites & Environment

This component targets Python 3.11+ (for datetime.UTC and improved zoneinfo handling) and leans only on the standard library for the hot path — collections.deque, datetime, decimal, and hashlib — so it can run identically on a cloud reconciliation worker and an edge validator. Optional production dependencies are a distributed cache (redis>=5.0) for shared session state across a validator fleet and a stream client (confluent-kafka or pulsar-client) for the tap feed.

Assumptions the evaluator makes about its inputs:

Assumption	Expectation	Why it matters
Media identity	`media_hash` already canonical (SHA-256), never a raw PAN or card serial	Window state is keyed on it; two encodings of the same card would split the session
Timestamp	Timezone-aware UTC, ISO 8601, resolved from validator NTP	The eligibility inequality is meaningless on naive or local-time stamps
Fare amounts	`Decimal`, minor-unit precise — never `float`	Transfer discounts and base fares feed settlement; float drift is unrecoverable
Event ordering	Roughly monotonic per media, with bounded out-of-order jitter	The sliding window assumes the anchoring tap is the oldest in the session
Window policy	Externally supplied (60–120 min typical) plus operator grace offset	Thresholds change per agreement without a code deploy

Window boundaries themselves are not hard-coded here. They are supplied by the Threshold Tuning Frameworks, which let revenue analysts adjust grace periods, buffer tolerances, and operator-specific offsets against historical reconciliation logs without redeploying the calculation binary.

Core Implementation

The following implementation models transfer window logic as a memory-efficient, streaming reconciliation processor. It uses collections.deque for O(1) window sliding, Decimal for every monetary value, explicit error boundaries, and generator-based processing so the full dataset is never materialized in RAM.

from __future__ import annotations

import hashlib
import logging
from collections import deque
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone
from decimal import Decimal, ROUND_HALF_UP
from typing import Iterator, Dict, Set

logger = logging.getLogger("transit.transfer_window")

MONEY = Decimal("0.01")  # minor-unit quantization target


class TransferProcessingError(Exception):
    """Base exception for transfer window validation failures."""


class ClockSkewViolation(TransferProcessingError):
    """Raised when a tap timestamp is not a usable UTC instant."""


@dataclass(frozen=True)
class TapEvent:
    media_id: str
    timestamp_utc: datetime
    validator_id: str
    operator_id: str
    route_id: str
    event_type: str          # 'IN' | 'OUT'
    base_fare: Decimal       # fare if this tap opens a new session
    raw_payload: dict = field(repr=False, default_factory=dict)


@dataclass
class TransferResult:
    media_id: str
    is_transfer: bool
    fare_charged: Decimal
    window_start: datetime
    window_end: datetime
    operator_handoff: bool
    audit_hash: str


class TransferWindowReconciler:
    """Deterministic, memory-bounded transfer-window evaluator for tap streams."""

    def __init__(
        self,
        window_minutes: int = 90,
        grace_seconds: int = 30,
        max_clock_skew_seconds: int = 5,
        max_active_windows: int = 500_000,
    ) -> None:
        self.window = timedelta(minutes=window_minutes)
        self.grace = timedelta(seconds=grace_seconds)
        self.max_skew = timedelta(seconds=max_clock_skew_seconds)
        self.max_windows = max_active_windows

        # Bounded state: media_id -> deque holding the single anchoring IN tap.
        self.active_sessions: Dict[str, deque[TapEvent]] = {}
        self.processed_hashes: Set[str] = set()

    def _event_hash(self, event: TapEvent) -> str:
        return hashlib.sha256(
            f"{event.media_id}:{event.timestamp_utc.isoformat()}:{event.validator_id}".encode()
        ).hexdigest()

    def _validate_timestamp(self, event: TapEvent) -> None:
        ts = event.timestamp_utc
        if ts.tzinfo is None or ts.utcoffset() != timedelta(0):
            raise ClockSkewViolation("Timestamp must be a UTC-aware instant.")
        # In production, also compare against the validator's last known NTP
        # sync offset; reject anything beyond self.max_skew.

    def _prune_expired_windows(self, current_utc: datetime) -> None:
        """Evict sessions whose anchoring tap has aged past window + grace,
        plus any empty deques left behind by a completed transfer."""
        cutoff = current_utc - self.window - self.grace
        expired = [
            mid for mid, dq in self.active_sessions.items()
            if not dq or dq[0].timestamp_utc < cutoff
        ]
        for mid in expired:
            del self.active_sessions[mid]

    def _quantize(self, amount: Decimal) -> Decimal:
        return amount.quantize(MONEY, rounding=ROUND_HALF_UP)

    def process_stream(self, tap_stream: Iterator[TapEvent]) -> Iterator[TransferResult]:
        """Memory-efficient streaming processor. Yields one reconciliation
        result per accepted tap without materializing the full dataset."""
        for event in tap_stream:
            try:
                self._validate_timestamp(event)
            except ClockSkewViolation as exc:
                logger.warning("Quarantining tap for %s: %s", event.media_id, exc)
                yield TransferResult(
                    media_id=event.media_id,
                    is_transfer=False,
                    fare_charged=Decimal("0.00"),
                    window_start=event.timestamp_utc,
                    window_end=event.timestamp_utc,
                    operator_handoff=False,
                    audit_hash="INVALID_TIMESTAMP",
                )
                continue

            evt_hash = self._event_hash(event)
            if evt_hash in self.processed_hashes:
                continue  # Idempotent skip on redelivery.
            self.processed_hashes.add(evt_hash)

            if len(self.active_sessions) >= self.max_windows:
                self._prune_expired_windows(event.timestamp_utc)

            session = self.active_sessions.setdefault(event.media_id, deque())

            is_transfer = False
            operator_handoff = False
            window_start = event.timestamp_utc
            fare_charged = self._quantize(event.base_fare)

            if session and session[0].event_type == "IN":
                anchor = session[0]
                elapsed = event.timestamp_utc - anchor.timestamp_utc
                if elapsed <= self.window + self.grace:
                    is_transfer = True
                    window_start = anchor.timestamp_utc
                    fare_charged = Decimal("0.00")  # discounted continuation
                    operator_handoff = event.operator_id != anchor.operator_id
                    session.clear()               # session consumed by the transfer
                else:
                    session.clear()               # window expired -> new base fare

            if event.event_type == "IN":
                session.append(event)

            yield TransferResult(
                media_id=event.media_id,
                is_transfer=is_transfer,
                fare_charged=fare_charged,
                window_start=window_start,
                window_end=event.timestamp_utc,
                operator_handoff=operator_handoff,
                audit_hash=evt_hash,
            )

Three design choices carry the correctness guarantees. _prune_expired_windows caps dictionary growth so a fleet-wide surge cannot exhaust memory; in a distributed deployment, pair it with a Redis-backed LRU so state is shared across validators rather than siloed per node. The SHA-256 hash set gives exactly-once semantics even when an upstream Kafka or Pulsar consumer redelivers, which matters because a double-processed tap is a double-charge. And every emitted decision carries a deterministic audit_hash, so the exact input that produced a fare can be replayed byte-for-byte during an operator dispute.

Schema Validation & Transit-Specific Edge Cases

The evaluator inherits clean events, but “clean” upstream does not mean “well-behaved” temporally. The edges that break naive window code are all timing and identity edges.

Timezone normalization. The fundamental window — typically 60 to 120 minutes — is anchored to the initial tap, but validators rarely share a perfectly synchronized clock. All arithmetic runs in UTC; a naive or local-time timestamp is rejected at _validate_timestamp rather than silently compared, following the guarantees in the Python datetime documentation. A tap that survives with a local-time stamp will misfire the window across every DST boundary.
Clock-skew rejection. Validator NTP drift skews event timestamps by seconds. Taps whose drift exceeds a configurable tolerance (for example ±5 s) are quarantined with an INVALID_TIMESTAMP audit marker instead of poisoning the window; they are repaired later against sync logs rather than charged on bad data.
Temporal monotonicity. Hardware batch-writes can deliver a media’s taps slightly out of order. The sliding window assumes the anchoring tap is the oldest event, so bounded jitter correction has to run before the window test — otherwise a late-arriving earlier tap resets the anchor and grants a spurious transfer.
Idempotency via deterministic hashing. Deduplication keys on (media_id, timestamp, validator_id). This is what makes retries safe: the same physical tap, redelivered, is dropped rather than re-evaluated.
Missing tap-outs. When a rider taps in but never taps out — a lost telemetry event or a gate that failed to register the exit — the deque-based session simply ages out after window + grace. It is never charged twice, and a nightly job matches orphaned IN events against completed trips, delegating an estimated fare to the Fallback Calculation Chains when no exit telemetry exists.
GTFS-RT alignment. Cross-referencing tap timestamps against scheduled departures from the GTFS-Realtime sync flags phantom taps and missed boardings that would otherwise anchor a false window.

The session lifecycle below shows a media session moving from an open tap-in through transfer completion or window expiry — the two terminal states the reconciler must handle:

Integration Pattern

The window evaluator is one stage in a single-pass fare resolution, not a standalone service. Its output feeds two consumers, and its correctness depends on one sibling running in the same traversal.

Because concession status, daily capping, and any loyalty multiplier must all be resolved without a second pass — a second pass reintroduces the race conditions and double-counting that determinism is meant to eliminate — the transfer decision is computed alongside eligibility rather than before it. When a tap reaches the Discount Eligibility Engines, a fare_charged of Decimal("0.00") from a fired window must win over a concession discount that would otherwise apply, so the two evaluators share the same session context and the window result is authoritative for that leg.

Downstream, the operator_handoff flag is the trigger for inter-agency settlement. A same-media transfer that crosses an operator boundary is a payout event: the origin operator collected the base fare, the destination operator carried a rider for free, and the clearinghouse owes a proration. That resolution — including the routing across jurisdictional boundaries — is the subject of calculating cross-operator transfer windows with Python. On an offline edge validator that cannot see fleet-wide state, the same window logic runs against a locally cached session table under the fallback routing strategies that store and forward taps until connectivity returns.

Performance & Scale Considerations

At production volume — millions of taps per hour across a metro fleet — the evaluator’s cost model is dominated by two structures: the active-session dictionary and the processed-hash set.

Bounded active state. Only sessions with an open, in-window anchor tap live in memory; _prune_expired_windows evicts the rest on every threshold breach. Peak memory is therefore a function of concurrent in-progress journeys, not total daily volume — typically tens of thousands of entries, not millions.
Hash-set growth. The idempotency set grows unbounded within a processing window if left unmanaged. In practice, scope it to a rolling horizon (for example, drop hashes older than window + grace + settlement_lag) or back it with a TTL cache, since a tap can only ever be redelivered inside the redelivery window of the stream broker.
Sharding for parallelism. Partition the tap stream by media_hash so every tap for a given card lands on the same worker. This preserves per-media ordering — the one invariant the sliding window depends on — while letting throughput scale linearly across workers. Never round-robin taps across partitions; a split session double-charges.
Chunked replay. Nightly reconciliation over historical taps should stream in bounded chunks through the same generator interface, not load a day’s parquet into a DataFrame. The process_stream contract is identical for live and replay paths, which keeps the two from drifting.
Cold-storage flush. Expired sessions are written to durable storage immediately on eviction so a mid-shift worker restart replays from the ledger rather than losing in-flight windows.

Operational Checklist

Externalize thresholds. Confirm window length, grace, and per-operator offsets are pulled from the tuning store at runtime, not compiled in — validate a threshold change propagates without a redeploy.
Pin the clock contract. Enforce UTC-aware timestamps at ingestion and alert on any validator whose NTP drift approaches the skew tolerance before it starts quarantining taps.
Shard by media. Verify the stream partitioner keys on media_hash so no card’s taps split across workers.
Cap the hash set. Set an explicit TTL or horizon on the idempotency store and monitor its size against the redelivery window.
Run shadow mode before promotion. Route live taps through a parallel instance on the candidate rule version and diff its output against production before switching.
Watch transfer-rate drift. Continuously compare the fired-transfer ratio against historical baselines; a sudden move signals a timestamp, GTFS-RT alignment, or threshold regression.
Retain audit hashes. Store every audit_hash alongside its settlement batch so decisions are forensically replayable during operator disputes; sign the offline ledger per the AFC system security boundaries.
Reconcile orphaned tap-ins nightly. Schedule the job that matches unclosed IN events against completed trips and applies fallback fares where exit telemetry is missing.

Calculating Cross-Operator Transfer Windows with Python — resolving handoffs and settlement splits when a transfer crosses an operator boundary.

Discount Eligibility Engines — the concession resolver that shares session context with the window evaluator.
Threshold Tuning Frameworks — where window length, grace, and operator offsets are versioned and tuned.
Fallback Calculation Chains — how orphaned tap-ins and unresolved windows get an estimated fare.
Smart Card Schema Mapping — the normalization that produces the canonical media_hash this evaluator keys on.
GTFS-Realtime Sync — the schedule feed used to flag phantom taps and align window anchors.

Part of Fare Rule Validation & Calculation Engines.

Transfer Window Logic

Explore this section