Architecture & Design Decisions
IMAP Integration Strategy
The me.squibble.email service is designed to act as a stateless, high-performance HTTP-to-IMAP bridge. To ensure stability and security, we employ the following architectural patterns:
1. Synchronous Execution in Thread Pools
Although FastAPI is an asynchronous framework, the underlying Python imaplib library relies on synchronous, blocking TCP sockets.
- Decision: API route handlers interacting with IMAP (
def indexanddef show) are declared as standard synchronous functions (definstead ofasync def). - Reasoning: FastAPI automatically executes synchronous route handlers in a background thread pool (via
starlette.concurrency.run_in_threadpool). If they were declared asasync def, the blocking IMAP operations would execute on the main asyncio event loop, causing the entire server to freeze under concurrent load.
2. Selective Memory-Safe IMAP Fetching
Fetching emails with large attachments can cause severe memory bloat and Out-Of-Memory (OOM) crashes if the entire RFC822 payload is downloaded into the Python process.
- Decision: The
/messagesendpoint implements a two-pass selective fetch strategy. - Implementation:
- It fetches
(BODY.PEEK[HEADER] BODYSTRUCTURE)to retrieve only the lightweight email headers (for To, From, Subject, Date) and the structural tree of the email. - A custom S-Expression parser (
imap_helpers.py) decodes theBODYSTRUCTUREto locate the exact part IDs (e.g.,1,1.1,2) corresponding to thetext/plainandtext/htmlbodies. - It executes a secondary fetch (e.g.,
BODY.PEEK[1]) specifically for those text components, ignoring any large binary attachment payloads.
- It fetches
- Reasoning: This ensures that listing 100 emails with 25MB attachments requires kilobytes of RAM instead of gigabytes, while still returning full attachment metadata to the client.
3. Strict IMAP Connection Lifecycle Management
IMAP servers enforce strict connection limits. Orphaned connections lead to IP bans or service degradation.
- Decision: Every IMAP workflow is wrapped in a robust
try...finallyblock. - Reasoning: The
finallyblock guarantees thatimap.logout()is called to formally terminate the TCP session, even if message parsing fails, the network times out (configured to 15s), or a 404/400 exception is raised mid-flight.
4. Input Validation & Injection Prevention
The mailbox parameter is directly interpolated into IMAP SELECT commands.
- Decision: The
mailboxpath parameter is strictly validated using regex (^[^\r\n]+$) in the FastAPI route definition and via Pydantic validators. - Reasoning: Preventing carriage returns (
\r) and line feeds (\n) nullifies IMAP Command Injection attacks, where an attacker might append arbitrary commands (likeDELETE) to theSELECTstatement.
Bounce Cron Process Topology
The IMAP bounce processor (Phase 5) runs as a separate, synchronous, one-shot
process — NOT a subsystem of the async outbound worker and NOT a long-running
scheduler-in-process. See docs/adr/0002-phase5-bounce-processing.md for the
full decision record.
5. Separate-process Bounce Cron
Phase 5 needs to poll IMAP inboxes for DSNs while the Phase 3 outbound worker
runs in an asyncio loop (aiosmtplib, asyncpg). Three options were weighed:
(a) separate sync process, (b) wrap imaplib in run_in_threadpool inside the
async worker, (c) adopt aioimaplib and retire the sync-def rule in §1 above.
- Decision: Option (a). A one-shot CLI command
cli bounces:poll-oncescheduled externally by Docker Swarm periodic task / system cron at 5-minute cadence. Syncimaplibhelpers throughout; a sibling sync SQLAlchemy engine insrc/app/db/sync_session.pyagainst the sameDATABASE_URL. Noasyncioimports in the bounce cron process tree. - Reasoning:
- The sync-
def-for-imaplibrule in §1 stays a hard invariant without carve-outs.run_in_threadpoolwithin the async worker would open a legal exception that later code is one copy-paste away from violating. - Blast radius is scoped: a hung bounce poll cannot stall outbound delivery. A crashed bounce cron restarts on the next scheduler tick with fresh IMAP and DB connections — nothing leaks state across polls.
- The existing sync IMAP helpers (written for the inbound read API)
are reused as-is. No
aioimaplibdependency introduced; the rule in §1 does not have to move. - One-shot commands are trivial to reason about: every tick is a fresh DB session, a fresh IMAP connection, a fresh process. The scheduling policy lives outside the repo (cron or Swarm), which is where operator-tunable cadence belongs.
- The sync-
6. VERP Authenticity for Bounce Matching
Inbound “DSN-like” emails (From: MAILER-DAEMON, Subject: Undelivered…) are
trivially forgeable — anyone who knows a Message-ID can craft one and flip a
legitimate row to bounced. FEEDBACK.md §1.3 flagged this as the single
CRITICAL finding on the original Phase 5 design.
- Decision:
Return-Pathis rewritten per outbound message tobounce+{message_id}.{hmac16}@{mailbox.domain}(VERP). The HMAC reusesTRACKING_HMAC_SECRETwith abounce-verp:domain-separation prefix (truncated to 16 hex chars / 64 bits). The bounce cron matches ONLY on HMAC-verified VERP addresses extracted fromDelivered-To:/Envelope-To:/Return-Path:of the IMAP-delivered message. Unsigned / tampered / unknown-message_idDSNs are moved toRejected-Bouncesfor operator review; they never reach a DB write. - Reasoning: Header-heuristic matching is an anti-pattern even as a
fallback. A VERP address carries the
message_idit maps to as part of the HMAC input, so an attacker who cannot produce a valid HMAC cannot flip a targeted row. TheFrom/Sender/Reply-Tosender-binding invariant (ADR 0001 §4) is unaffected — onlyReturn-Pathand the SMTP envelope- sender are VERP-ified.
7. Per-mailbox Bounce Inbox (SPF Alignment)
A shared bounces@example.com or bounces.example.com would work for bounce
routing but would not align SPF for tenants whose From: lives on their own
domain (DMARC would have to rely on DKIM alignment alone).
- Decision: Each tenant provisions its own
bounces@{mailbox.domain}inbox with IMAP credentials captured in the sixbounce_imap_*columns onmailboxes. A NULLbounce_imap_hostis an explicit opt-out — the worker keeps the originalReturn-Path: <{mailbox.email}>and the bounce cron skips that mailbox entirely. - Reasoning: Per-mailbox is operationally heavier (one more credential
pair per tenant) but preserves SPF alignment on
Return-Path, isolates bounce-inbox problems per tenant, and keeps the sending-identity model symmetric with existing SMTP + IMAP credential provisioning.
Data Model
All persistent state lives in PostgreSQL. Seven tables across two bounded contexts: identity/auth (Mailbox + AgentToken) and outbound delivery (OutboundMessage + OutboundMessageEvent + Suppression + WebhookDelivery + IdempotencyKey).
erDiagram
Mailbox {
uuid id PK
string email UK
string server
int port
string username
string encrypted_password
string smtp_host
int smtp_port
string smtp_username
string smtp_password_encrypted
enum smtp_tls_mode
text bounce_imap_host
int bounce_imap_port
text bounce_imap_username
bytes bounce_imap_password_encrypted
enum bounce_imap_tls_mode
text bounce_imap_folder
text bounce_verp_domain
text webhook_url
text webhook_secret_encrypted
}
AgentToken {
uuid id PK
uuid jti UK
string name
jsonb permissions
bool is_revoked
timestamptz revoked_at
uuid mailbox_id FK
}
OutboundMessage {
uuid id PK
uuid mailbox_id FK
string recipient_email
string subject
enum message_stream
enum status
text html_body
text text_body
int attempts
timestamptz next_retry_at
timestamptz sent_at
timestamptz processing_started_at
timestamptz opened_at
timestamptz clicked_at
text error_log
enum bounce_type
text bounce_diagnostic
timestamptz bounced_at
uuid token_jti
timestamptz created_at
timestamptz updated_at
}
OutboundMessageEvent {
uuid id PK
uuid outbound_message_id FK
uuid mailbox_id FK
enum event_type
jsonb payload
timestamptz occurred_at
timestamptz created_at
}
Suppression {
uuid id PK
uuid mailbox_id FK
citext recipient_email
enum reason
uuid source_message_id FK
timestamptz created_at
text notes
}
WebhookDelivery {
uuid id PK
uuid outbound_message_event_id FK
uuid mailbox_id FK
enum status
int attempts
timestamptz next_retry_at
timestamptz processing_started_at
timestamptz delivered_at
text last_error
timestamptz created_at
timestamptz updated_at
}
IdempotencyKey {
uuid id PK
uuid token_jti
string idempotency_key
jsonb message_ids
timestamptz created_at
}
Mailbox ||--o{ AgentToken : "issues"
Mailbox ||--o{ OutboundMessage : "sends via"
Mailbox ||--o{ Suppression : "owns"
Mailbox ||--o{ OutboundMessageEvent : "scopes"
Mailbox ||--o{ WebhookDelivery : "scopes"
OutboundMessage ||--o{ OutboundMessageEvent : "logged as"
OutboundMessage ||--o| Suppression : "source of"
OutboundMessageEvent ||--|| WebhookDelivery : "delivered by"
Entity Roles
| Entity | Role |
|---|---|
Mailbox | Tenant root — owns IMAP, SMTP, bounce-IMAP, and webhook credentials |
AgentToken | Bearer credential scoped to one Mailbox; jti is the revocation handle |
OutboundMessage | Core delivery unit — tracks the full lifecycle from queued → sent / bounced |
OutboundMessageEvent | Append-only audit ledger; one row per state transition or engagement event |
Suppression | Recipient-level block list; hard bounces are auto-inserted here |
WebhookDelivery | Retry queue for delivering OutboundMessageEvent payloads to operator endpoints |
IdempotencyKey | 24-hour dedup window keyed on (token_jti, Idempotency-Key header) |
Key Design Notes
IdempotencyKey.token_jtihas no FK — tokens can be deleted without orphan cascades, dedup still works within the TTL window.OutboundMessage.token_jtiis likewise intentionally FK-free; it exists for quota counting per token, not referential integrity.Suppression.recipient_emailuses the PostgreSQLCITEXTextension for case-insensitive uniqueness withoutLOWER()everywhere.WebhookDeliveryis 1-to-1 withOutboundMessageEvent(unique constraint onoutbound_message_event_id), ensuring exactly-once delivery attempts per event.