Phase 5: IMAP Bounce Processing
Overview
When the downstream MTA cannot deliver a message it returns a Delivery Status
Notification (DSN). This phase closes that loop automatically: bounces are
parsed, the message row is flipped to bounced with a transient/permanent
classification (RFC 3463), and — for permanent failures — the recipient is
inserted into the per-mailbox suppressions table. Future send attempts to a
suppressed address are rejected at the API layer with 422 before any SMTP
connection is made. The entire pipeline is VERP-authenticated: a forged DSN
cannot flip a legitimate message’s status, and unsigned bounces are silently
quarantined.
Design ADR:
docs/adr/0002-phase5-bounce-processing.mdis the binding design record. This spec is the implementor-facing checklist; where the two disagree, the ADR wins.
Goal
Flip outbound_messages rows to bounced when the downstream MTA
returns a Delivery Status Notification (DSN), classify the bounce as
transient vs. permanent (RFC 3463), and maintain a per-mailbox
suppressions table so recipients with permanent failures are blocked
from future /messages/send requests. The bounce identification path is
VERP-authenticated end-to-end — no header-heuristic trust, no
subject-keyword matching.
Process topology
The bounce processor is a separate, synchronous, one-shot CLI
command (cli bounces:poll-once), scheduled by an external periodic
mechanism (Docker Swarm periodic task / system cron) at an initial
cadence of every 5 minutes. It is not a long-running scheduler and
it is not a task inside the async outbound worker. This honors the
CONTRIBUTING.md §IMAP Parsing sync-def-for-imaplib invariant without
thread-pool carve-outs. See ADR 0002 §2.
TDD Acceptance Criteria
All must fail before implementation and pass after, in listed order:
pytest tests/services/test_verp.py::test_sign_verp_round_tripspytest tests/services/test_verp.py::test_verify_rejects_unsigned_and_tamperedpytest tests/services/test_bounce_parser.py::test_extracts_original_message_id_from_dsnpytest tests/services/test_bounce_parser.py::test_classifies_status_4xx_as_transientpytest tests/services/test_bounce_parser.py::test_classifies_status_5xx_as_permanentpytest tests/services/test_bounce_parser.py::test_classifies_missing_status_as_unknownpytest tests/workers/test_bounce_cron.py::test_verp_authenticated_bounce_flips_row_to_bounced_permanentpytest tests/workers/test_bounce_cron.py::test_transient_bounce_does_not_insert_suppressionpytest tests/workers/test_bounce_cron.py::test_permanent_bounce_inserts_suppression_rowpytest tests/workers/test_bounce_cron.py::test_forged_bounce_with_valid_message_id_is_rejectedpytest tests/workers/test_bounce_cron.py::test_non_multipart_report_email_is_rejectedpytest tests/workers/test_bounce_cron.py::test_bounce_email_is_moved_to_processed_folder_on_successpytest tests/workers/test_bounce_cron.py::test_bounce_email_is_moved_to_rejected_folder_on_hmac_failurepytest tests/api/test_outbound_send.py::test_422_rejects_suppressed_recipientpytest tests/commands/test_suppression_commands.py::test_cli_remove_clears_suppression_rowpytest tests/workers/test_smtp_delivery.py::test_return_path_uses_verp_when_bounce_imap_configuredpytest tests/workers/test_smtp_delivery.py::test_return_path_falls_back_to_mailbox_email_when_bounce_imap_unconfiguredpytest tests/test_alembic.py::test_single_head(still green after adding0004_bounce_imap_and_verp)
Technical Specifications
VERP addressing
- New module
src/app/utils/verp.py:sign_verp(message_id: UUID) -> strreturnsbounce+{message_id}.{hmac16}@placeholdertemplate; the caller substitutes the domain. Prefer abuild_verp_return_path(mailbox, message_id) -> strhelper that inspectsmailbox.bounce_verp_domain(falling back tomailbox.email.split('@', 1)[1]) and returns the fully qualifiedReturn-Path.verify_verp(local_part: str) -> UUID | None— constant-time HMAC comparison. Returns themessage_idon success,Noneon any of: missingbounce+prefix, malformed, bad HMAC, wrong length. Never raises.
- HMAC input:
b"bounce-verp:" + str(message_id).encode()signed with the existingTRACKING_HMAC_SECRET. The domain-separation prefix prevents a signed tracking token from being reinterpretable as a VERP token. - HMAC output: SHA-256, hex, first 16 chars (64 bits). Matches the
existing
sign_message_idprecedent insrc/app/utils/tracking_token.py. - Worker change (
src/app/workers/smtp_delivery.py,_bind_sender_headers): whenmailbox.bounce_imap_hostis set,Return-Pathand the SMTPMAIL FROMenvelope sender become the VERP address. When it is NULL, both stay as<{mailbox.email}>(current behavior).From/Sender/Reply-Toremain pinned tomailbox.emailregardless — VERP does NOT weaken the sender-binding invariant (ADR 0001 §4, FEEDBACK.md §1.4).
Schema — Alembic revision 0004_bounce_imap_and_verp
On mailboxes:
bounce_imap_hosttext NULLbounce_imap_portint NULL (DB default993documented; no server default — NULL means “feature not configured”)bounce_imap_usernametext NULLbounce_imap_password_encryptedbytea NULLbounce_imap_tls_modemailboxtlsmodeNULL (reuses the enum from0003_smtp_tls_mode; default on the model layer isimplicitbecause bounce IMAP is almost always 993)bounce_imap_foldertext NULL (default string'INBOX'at the model layer)bounce_verp_domaintext NULL (explicit override; worker falls back tomailbox.emaildomain when NULL)
New enum bouncetype(transient, permanent, unknown).
On outbound_messages:
bounce_typebouncetypeNULLbounce_diagnostictext NULL (scrubbed viaredact_piibefore persist, same rules aserror_log)bounced_attimestamptz NULL
New enum suppressionreason(hard_bounce, unsubscribe, complaint).
Phase 5 only writes hard_bounce; the other two are reserved for
§1.10 pre-GA.
New table suppressions:
CREATE TABLE suppressions (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
mailbox_id uuid NOT NULL REFERENCES mailboxes(id) ON DELETE CASCADE,
recipient_email citext NOT NULL,
reason suppressionreason NOT NULL,
source_message_id uuid NULL REFERENCES outbound_messages(id) ON DELETE SET NULL,
created_at timestamptz NOT NULL DEFAULT now(),
notes text NULL,
UNIQUE (mailbox_id, recipient_email)
);
CREATE INDEX suppressions_mailbox_idx ON suppressions(mailbox_id);
If citext is not already enabled, the migration adds
CREATE EXTENSION IF NOT EXISTS citext; before the table. The
email-address comparison must be case-insensitive (ALICE@EX.COM ==
alice@ex.com).
Guard: tests/test_alembic.py still asserts a single head after
0004 is merged.
Bounce parser (src/app/services/bounce_parser.py, sync)
- Input: raw
.emlbytes (or apathlib.Pathto one, for tests). parse(bytes) -> ParsedBounce | NonewhereParsedBouncecarries:recipient_local_part(thebounce+{token}piece of the envelopeTo:, pulled from the IMAP-delivered message’sDelivered-To:/Envelope-To:/Return-Path:— IMAP servers vary; the parser tries all three in order and returns the first valid VERP match; no header-only heuristic match)verp_message_id: UUID(HMAC-verified viaverp.verify_verp; if verification fails,parsereturnsNone)bounce_type: Literal['transient', 'permanent', 'unknown']diagnostic_raw: str(theDiagnostic-Code:line from themessage/delivery-statuspart, beforeredact_pii; the caller runsredact_piibefore persisting tobounce_diagnostic)original_recipient_email: str(fromFinal-Recipient:/Original-Recipient:in themessage/delivery-statuspart — this is what goes intosuppressions.recipient_emailfor permanent bounces)
- Requires the message have
Content-Type: multipart/report; report-type=delivery-status. Anything else →None. - Classification:
Status:line in the delivery-status part, first digit:4 → transient,5 → permanent, other / missing →unknown.
Bounce cron worker (src/app/workers/bounce_cron.py, sync)
- Entry point
poll_once()called bycli bounces:poll-once. - For each mailbox with
bounce_imap_host IS NOT NULL:- Build a sync
IMAPClient(reuse existing helpers insrc/app/services/imap_*) againstbounce_imap_host:bounce_imap_port, TLS perbounce_imap_tls_mode, creds decrypted frombounce_imap_password_encrypted. SELECTbounce_imap_folder, searchUNSEEN.- For each message UID:
- Fetch full RFC822 (
BODY.PEEK[]— small, DSNs are tiny). parsed = bounce_parser.parse(raw).- If
parsed is None→ move toRejected-Bouncesfolder (create if missing), incrementbounce_cron_rejected_total{reason=malformed|hmac}metric, continue. - Look up
outbound_messagesbyid = parsed.verp_message_id. If absent → move toRejected-Bounces, metricbounce_cron_rejected_total{reason=unknown_message_id}, continue. (A VERP token whose message_id does not match any row is either a pre-retention DSN or a forgery that happened to guess a valid HMAC — both are non-events.) - Authoritative bounce. In one DB transaction:
UPDATE outbound_messages SET status='bounced', bounce_type=$1, bounce_diagnostic=redact_pii($2), bounced_at=now() WHERE id=$msg.- If
bounce_type == 'permanent':INSERT INTO suppressions (mailbox_id, recipient_email, reason, source_message_id) VALUES ($mbx, $rcpt, 'hard_bounce', $msg) ON CONFLICT (mailbox_id, recipient_email) DO NOTHING.
- On commit success: move the IMAP message to
Processed-Bounces(create if missing), mark\\Seen. - On any DB error: leave the message unprocessed (UNSEEN),
next poll retries. Do NOT move to
Processed-orRejected-until the DB transaction commits.
- Fetch full RFC822 (
LOGOUTinside atry / finally(existing IMAP discipline).
- Build a sync
- Exit code 0 on clean pass; non-zero on per-mailbox exception (caught, logged with redacted stderr line per mailbox, but bubbled up so the external scheduler alarms).
- DB session: sync SQLAlchemy engine. Factor
src/app/db/sync_session.pyalongside the existing async session factory, reading the sameDATABASE_URLand translatingpostgresql+asyncpg→postgresql+psycopg2(orpostgresql://bare) at engine build time.
Send path (src/app/api/…/send) — suppressions enforcement
- Before row insertion in the
/messages/sendhandler (Phase 2 already owns this handler), querysuppressionsby(mailbox_id, recipient_email)for every combinedto/cc/bccrecipient. - If any recipient matches: reject the whole request with
422anderror.details.suppressed: [<addr>, …](not a partial success). No row is inserted for any recipient when the request is rejected. No MIME is built. - The lookup is a single
SELECT recipient_email FROM suppressions WHERE mailbox_id = $1 AND recipient_email = ANY($2::citext[]). No per-recipient round-trip. - This check happens after the existing Pydantic caps and the sender-binding validation — order does not matter for correctness (all of these paths reject the request), but suppressions last means a caller sending a malformed payload gets the malformed error, not the suppressed-recipient error.
CLI surface
cli bounces:poll-once— runs the cron once (above). Prints a one-line redacted summary:bounces: processed=N(perm=X, trans=Y, unk=Z), rejected=R, errors=E.cli suppressions:list [--mailbox <email>] [--full]— paginated, defaults to redacted output (hash-prefix + domain only).--fullrequires interactive confirmation.cli suppressions:remove <mailbox-email> <recipient-email>— deletes the(mailbox_id, recipient_email)row. Prints a one-line confirmation.cli mailboxes:create/cli mailboxes:updategain the bounce-IMAP prompt block. An empty response forbounce_imap_hostdisables bounce processing for that mailbox and the CLI prints:warning: bounce processing disabled for this mailbox — DSNs will be black-holed at the upstream MTA.
Deployment
- Docker Swarm service
email.bounce-cron:- Image: same as
email.worker(single-image repo). - Command:
cli bounces:poll-onceinside a wrapper that runs it on a schedule (eithercronin-container, or a Swarmreplicated-jobwith--detach=falsefired by an external scheduler — chosen at deploy time, orthogonal to this phase). - Resource limits + health check + restart policy declared
alongside
email.apiandemail.workerper the §Docker Swarm rule inCONTRIBUTING.md.
- Image: same as
.env.exampleadds:# bounce cron polls mailboxes with bounce_imap_host configured; default cadence 5 minutes, set via Swarm scheduler.
Observability (ADR 0002 §Observability, routed through PLAN.md §3.4)
New Prometheus metrics (exported wherever the pre-GA §3.4 work eventually lands — Phase 5 exposes them in-process whether or not the scrape endpoint is wired):
outbound_messages_bounced_total{type=permanent|transient|unknown}bounce_cron_polled_total{mailbox_hash}bounce_cron_rejected_total{reason=malformed|hmac|unknown_message_id}bounce_cron_suppressions_inserted_totalbounce_cron_duration_seconds{mailbox_hash}(histogram)
mailbox_hash is sha256(mailbox.email)[:12] — operator can
correlate by running cli mailboxes:list --hash. Raw email
addresses as Prometheus labels would re-introduce the PII leak
that §1.5 closes.
Testing discipline
Every DSN fixture under tests/fixtures/dsn/ is a real
RFC-3464-shaped .eml. Hand-crafted for coverage:
valid_5xx_user_unknown.eml— permanent bounce, clean VERP.valid_4xx_mailbox_full.eml— transient, clean VERP.missing_status_line.eml—unknownclassification.malformed_multipart.eml— notmultipart/report, rejected.forged_from_mailer_daemon.eml— correctFrom: MAILER-DAEMON, correct subject, but unsigned VERP: must not flip any row. This is the regression test for §1.3.valid_verp_unknown_message_id.eml— HMAC is valid because the attacker computed it against a random UUID, but the UUID does not match anyoutbound_messagesrow. Must be rejected (moved toRejected-Bounces), no DB write.
The IMAP interaction is mocked at the imaplib.IMAP4_SSL layer,
not at the helper layer, so the test exercises the real folder
moves (COPY + STORE +FLAGS \\Deleted + EXPUNGE).
Required reading before coding
- ADR 0002 (
docs/adr/0002-phase5-bounce-processing.md) — binding design record. - FEEDBACK.md §1.3, §2.1, §2.4, §2.6 — the problems this phase closes.
CONTRIBUTING.md§IMAP Parsing + §Outbound SMTP & Tracking + §Queue, Worker & Migrations — the invariants this phase must not break.src/app/workers/smtp_delivery.py::_bind_sender_headers— current Return-Path construction that the VERP rewrite slots into.src/app/utils/tracking_token.py— HMAC-signing precedent that the VERP token format mirrors (same secret, domain-separated prefix).
Anti-patterns (do NOT ship any of these)
- ANTI-PATTERN: matching
From: MAILER-DAEMON/ subject keyword anywhere in the bounce pipeline, even as a “helpful fallback”. An unsigned bounce is not a bounce. §1.3 is a CRITICAL finding. - ANTI-PATTERN: calling
imaplibfrom inside the async outbound worker viarun_in_threadpool. ADR 0002 §2 chose the separate- process option; do not re-open that decision inside the worker. - ANTI-PATTERN: silent-skip of suppressed recipients on
/messages/send. The request is rejected with 422 and the suppressed addresses surface in the error response; partial successes hide deliverability problems. - ANTI-PATTERN: logging raw recipient addresses or raw DSN
Diagnostic-Codelines. Both go throughredact_piibefore persisting (§1.5 rule); Prometheus labels use the mailbox hash, never the address. - ANTI-PATTERN: cross-mailbox suppression. The unique constraint
is
(mailbox_id, recipient_email)for a reason — each mailbox is a separate sending identity.