Architecture & Design Decisions

IMAP Integration Strategy

The me.squibble.email service is designed to act as a stateless, high-performance HTTP-to-IMAP bridge. To ensure stability and security, we employ the following architectural patterns:

1. Synchronous Execution in Thread Pools

Although FastAPI is an asynchronous framework, the underlying Python imaplib library relies on synchronous, blocking TCP sockets.

  • Decision: API route handlers interacting with IMAP (def index and def show) are declared as standard synchronous functions (def instead of async def).
  • Reasoning: FastAPI automatically executes synchronous route handlers in a background thread pool (via starlette.concurrency.run_in_threadpool). If they were declared as async def, the blocking IMAP operations would execute on the main asyncio event loop, causing the entire server to freeze under concurrent load.

2. Selective Memory-Safe IMAP Fetching

Fetching emails with large attachments can cause severe memory bloat and Out-Of-Memory (OOM) crashes if the entire RFC822 payload is downloaded into the Python process.

  • Decision: The /messages endpoint implements a two-pass selective fetch strategy.
  • Implementation:
    1. It fetches (BODY.PEEK[HEADER] BODYSTRUCTURE) to retrieve only the lightweight email headers (for To, From, Subject, Date) and the structural tree of the email.
    2. A custom S-Expression parser (imap_helpers.py) decodes the BODYSTRUCTURE to locate the exact part IDs (e.g., 1, 1.1, 2) corresponding to the text/plain and text/html bodies.
    3. It executes a secondary fetch (e.g., BODY.PEEK[1]) specifically for those text components, ignoring any large binary attachment payloads.
  • Reasoning: This ensures that listing 100 emails with 25MB attachments requires kilobytes of RAM instead of gigabytes, while still returning full attachment metadata to the client.

3. Strict IMAP Connection Lifecycle Management

IMAP servers enforce strict connection limits. Orphaned connections lead to IP bans or service degradation.

  • Decision: Every IMAP workflow is wrapped in a robust try...finally block.
  • Reasoning: The finally block guarantees that imap.logout() is called to formally terminate the TCP session, even if message parsing fails, the network times out (configured to 15s), or a 404/400 exception is raised mid-flight.

4. Input Validation & Injection Prevention

The mailbox parameter is directly interpolated into IMAP SELECT commands.

  • Decision: The mailbox path parameter is strictly validated using regex (^[^\r\n]+$) in the FastAPI route definition and via Pydantic validators.
  • Reasoning: Preventing carriage returns (\r) and line feeds (\n) nullifies IMAP Command Injection attacks, where an attacker might append arbitrary commands (like DELETE) to the SELECT statement.

Bounce Cron Process Topology

The IMAP bounce processor (Phase 5) runs as a separate, synchronous, one-shot process — NOT a subsystem of the async outbound worker and NOT a long-running scheduler-in-process. See docs/adr/0002-phase5-bounce-processing.md for the full decision record.

5. Separate-process Bounce Cron

Phase 5 needs to poll IMAP inboxes for DSNs while the Phase 3 outbound worker runs in an asyncio loop (aiosmtplib, asyncpg). Three options were weighed: (a) separate sync process, (b) wrap imaplib in run_in_threadpool inside the async worker, (c) adopt aioimaplib and retire the sync-def rule in §1 above.

  • Decision: Option (a). A one-shot CLI command cli bounces:poll-once scheduled externally by Docker Swarm periodic task / system cron at 5-minute cadence. Sync imaplib helpers throughout; a sibling sync SQLAlchemy engine in src/app/db/sync_session.py against the same DATABASE_URL. No asyncio imports in the bounce cron process tree.
  • Reasoning:
    1. The sync-def-for-imaplib rule in §1 stays a hard invariant without carve-outs. run_in_threadpool within the async worker would open a legal exception that later code is one copy-paste away from violating.
    2. Blast radius is scoped: a hung bounce poll cannot stall outbound delivery. A crashed bounce cron restarts on the next scheduler tick with fresh IMAP and DB connections — nothing leaks state across polls.
    3. The existing sync IMAP helpers (written for the inbound read API) are reused as-is. No aioimaplib dependency introduced; the rule in §1 does not have to move.
    4. One-shot commands are trivial to reason about: every tick is a fresh DB session, a fresh IMAP connection, a fresh process. The scheduling policy lives outside the repo (cron or Swarm), which is where operator-tunable cadence belongs.

6. VERP Authenticity for Bounce Matching

Inbound “DSN-like” emails (From: MAILER-DAEMON, Subject: Undelivered…) are trivially forgeable — anyone who knows a Message-ID can craft one and flip a legitimate row to bounced. FEEDBACK.md §1.3 flagged this as the single CRITICAL finding on the original Phase 5 design.

  • Decision: Return-Path is rewritten per outbound message to bounce+{message_id}.{hmac16}@{mailbox.domain} (VERP). The HMAC reuses TRACKING_HMAC_SECRET with a bounce-verp: domain-separation prefix (truncated to 16 hex chars / 64 bits). The bounce cron matches ONLY on HMAC-verified VERP addresses extracted from Delivered-To: / Envelope-To: / Return-Path: of the IMAP-delivered message. Unsigned / tampered / unknown- message_id DSNs are moved to Rejected-Bounces for operator review; they never reach a DB write.
  • Reasoning: Header-heuristic matching is an anti-pattern even as a fallback. A VERP address carries the message_id it maps to as part of the HMAC input, so an attacker who cannot produce a valid HMAC cannot flip a targeted row. The From / Sender / Reply-To sender-binding invariant (ADR 0001 §4) is unaffected — only Return-Path and the SMTP envelope- sender are VERP-ified.

7. Per-mailbox Bounce Inbox (SPF Alignment)

A shared bounces@example.com or bounces.example.com would work for bounce routing but would not align SPF for tenants whose From: lives on their own domain (DMARC would have to rely on DKIM alignment alone).

  • Decision: Each tenant provisions its own bounces@{mailbox.domain} inbox with IMAP credentials captured in the six bounce_imap_* columns on mailboxes. A NULL bounce_imap_host is an explicit opt-out — the worker keeps the original Return-Path: <{mailbox.email}> and the bounce cron skips that mailbox entirely.
  • Reasoning: Per-mailbox is operationally heavier (one more credential pair per tenant) but preserves SPF alignment on Return-Path, isolates bounce-inbox problems per tenant, and keeps the sending-identity model symmetric with existing SMTP + IMAP credential provisioning.

Data Model

All persistent state lives in PostgreSQL. Seven tables across two bounded contexts: identity/auth (Mailbox + AgentToken) and outbound delivery (OutboundMessage + OutboundMessageEvent + Suppression + WebhookDelivery + IdempotencyKey).

erDiagram
    Mailbox {
        uuid id PK
        string email UK
        string server
        int port
        string username
        string encrypted_password
        string smtp_host
        int smtp_port
        string smtp_username
        string smtp_password_encrypted
        enum smtp_tls_mode
        text bounce_imap_host
        int bounce_imap_port
        text bounce_imap_username
        bytes bounce_imap_password_encrypted
        enum bounce_imap_tls_mode
        text bounce_imap_folder
        text bounce_verp_domain
        text webhook_url
        text webhook_secret_encrypted
    }

    AgentToken {
        uuid id PK
        uuid jti UK
        string name
        jsonb permissions
        bool is_revoked
        timestamptz revoked_at
        uuid mailbox_id FK
    }

    OutboundMessage {
        uuid id PK
        uuid mailbox_id FK
        string recipient_email
        string subject
        enum message_stream
        enum status
        text html_body
        text text_body
        int attempts
        timestamptz next_retry_at
        timestamptz sent_at
        timestamptz processing_started_at
        timestamptz opened_at
        timestamptz clicked_at
        text error_log
        enum bounce_type
        text bounce_diagnostic
        timestamptz bounced_at
        uuid token_jti
        timestamptz created_at
        timestamptz updated_at
    }

    OutboundMessageEvent {
        uuid id PK
        uuid outbound_message_id FK
        uuid mailbox_id FK
        enum event_type
        jsonb payload
        timestamptz occurred_at
        timestamptz created_at
    }

    Suppression {
        uuid id PK
        uuid mailbox_id FK
        citext recipient_email
        enum reason
        uuid source_message_id FK
        timestamptz created_at
        text notes
    }

    WebhookDelivery {
        uuid id PK
        uuid outbound_message_event_id FK
        uuid mailbox_id FK
        enum status
        int attempts
        timestamptz next_retry_at
        timestamptz processing_started_at
        timestamptz delivered_at
        text last_error
        timestamptz created_at
        timestamptz updated_at
    }

    IdempotencyKey {
        uuid id PK
        uuid token_jti
        string idempotency_key
        jsonb message_ids
        timestamptz created_at
    }

    Mailbox ||--o{ AgentToken : "issues"
    Mailbox ||--o{ OutboundMessage : "sends via"
    Mailbox ||--o{ Suppression : "owns"
    Mailbox ||--o{ OutboundMessageEvent : "scopes"
    Mailbox ||--o{ WebhookDelivery : "scopes"
    OutboundMessage ||--o{ OutboundMessageEvent : "logged as"
    OutboundMessage ||--o| Suppression : "source of"
    OutboundMessageEvent ||--|| WebhookDelivery : "delivered by"

Entity Roles

EntityRole
MailboxTenant root — owns IMAP, SMTP, bounce-IMAP, and webhook credentials
AgentTokenBearer credential scoped to one Mailbox; jti is the revocation handle
OutboundMessageCore delivery unit — tracks the full lifecycle from queuedsent / bounced
OutboundMessageEventAppend-only audit ledger; one row per state transition or engagement event
SuppressionRecipient-level block list; hard bounces are auto-inserted here
WebhookDeliveryRetry queue for delivering OutboundMessageEvent payloads to operator endpoints
IdempotencyKey24-hour dedup window keyed on (token_jti, Idempotency-Key header)

Key Design Notes

  • IdempotencyKey.token_jti has no FK — tokens can be deleted without orphan cascades, dedup still works within the TTL window.
  • OutboundMessage.token_jti is likewise intentionally FK-free; it exists for quota counting per token, not referential integrity.
  • Suppression.recipient_email uses the PostgreSQL CITEXT extension for case-insensitive uniqueness without LOWER() everywhere.
  • WebhookDelivery is 1-to-1 with OutboundMessageEvent (unique constraint on outbound_message_event_id), ensuring exactly-once delivery attempts per event.

Sourced from docs/architecture.md in the repo. Edits go through the same review as code.