# Architecture Hardening (review-driven)

This doc folds in two external architecture reviews. The core message — **make the AI a
powerful *client* of a stable platform, not the platform itself** — reshapes the boundaries
without changing the foundation (still Medusa + Next.js) or the goals (zero manual deploy,
run-from-Claude-chat).

**Three decisions locked with the owner:**
1. **Hosting:** stay **all-cPanel**, but **hardened** (keep-alive cron, Upstash Redis from day
   one, careful migrations) — see §14.
2. **AI edits:** **Draft → Preview → Publish** with version history + rollback — see §5.
3. **Framing:** **commerce-first, OS-ready core** — ship the store, but build the core so future
   modules (accounting, CRM, POS) slot in later — see §16.

---

## 1. Platform Core + Control API at the center (AI is a client)

The highest-risk thing in the first draft was Claude sitting in the center. Inverted:

```mermaid
flowchart TB
    subgraph Clients
      ADMIN["Admin UI<br/>(works with NO AI)"]
      CLAUDE["Claude (MCP adapter)"]
      N8N["n8n / automations"]
      FUTURE["Mobile / Voice (later)"]
    end
    CAPI["Control API (permanent core)<br/>domain verbs · auth · RBAC · validation"]
    subgraph CoreServices["Platform Core services"]
      COMMERCE["Commerce (Medusa today)"]
      CONTENT["Content + Presentation"]
      EVENTS["Event Bus"]
      FLOWS["Workflow Engine"]
      MEM["AI Memory"]
      AGENTS["Agent Registry"]
      ASSETS["Asset Service"]
      CHAN["Channel Manager"]
    end
    ADMIN --> CAPI
    CLAUDE --> CAPI
    N8N --> CAPI
    FUTURE --> CAPI
    CAPI --> CoreServices
```

**Why it matters:** if Claude is down, rate-limited, or makes a bad call, the owner still
runs the business from the **Admin UI** (refunds, labels, inventory, stop a sale). The Admin
UI is now a **first-class fallback**, not a "secondary surface." Everything — UI, AI, n8n —
goes through the **same Control API**, so business logic lives in one place.

## 2. AI adapter layer + multi-model router (a seam, kept simple)

MCP, and "which model is best," will keep changing. So **the Control API is permanent; AI is
an adapter behind it.**

```mermaid
flowchart LR
    C["Claude"] --> R
    G["Gemini"] --> R
    O["OpenAI / future"] --> R
    R["Task Router<br/>(reasoning→Claude, batch→Gemini, vision/image→best model)"] --> CAPI["Control API"]
```

MVP keeps it simple (Claude for reasoning/tool-use, Gemini for batch agents); the **router
seam** means we can add GPT/vision/image models later without touching the platform.

## 3. Commerce-engine interface (domain verbs, not Medusa internals)

The Control API speaks **domain language**; Medusa is an implementation detail behind it.

> Claude/clients say **`createProduct`**, never **`POST /admin/products`**.

A thin `CommerceEngine` interface (`createProduct`, `setInventory`, `createOrder`,
`refund`, …) is implemented by a Medusa adapter today. We **do not** build Vendure/Saleor
adapters now (premature) — but nothing above the interface knows Medusa exists, so a future
swap/migration is possible.

## 4. Content vs Presentation (split the theme model)

Per the reviews, separate **what** from **how it looks** so AI editing stays clean and the
same content can later feed mobile/AMP/email/PDF:

```mermaid
flowchart TB
    subgraph Content["CONTENT (what)"]
      P[Products] 
      COL[Collections]
      BLOG[Blogs / Pages]
      BLK[Content blocks]
    end
    subgraph Pres["PRESENTATION (how)"]
      TOK[Tokens: colors, fonts, spacing]
      LAY[Layout / section order]
    end
    Content --> TE["Theme Engine (renderer-agnostic)"]
    Pres --> TE
    TE --> NEXT["Next.js renderer (MVP)"]
    TE -. later .-> OTHER["Mobile / AMP / email / PDF"]
```

MVP ships only the **Next.js renderer**; the Theme Engine seam makes other renderers additive.

## 5. Draft → Preview → Publish (versioned) + cache invalidation

The big safety fix, and it **keeps zero-deploy** (publish is instant, not a redeploy):

```mermaid
flowchart LR
    CL["Claude / Admin"] -->|write| DRAFT[("Draft version")]
    DRAFT -->|preview token<br/>bypasses cache| PV["Preview URL"]
    CL -->|run test order against preview| PV
    CL -->|commit_changes| PUB[("Published version")]
    PUB -->|webhook → revalidateTag| SF["Live storefront (seconds)"]
    PUB -.rollback.-> PREV["Any prior version"]
```

- **Every theme/content entity is versioned** (draft + published + history) → **rollback** and
  **preview** are built in; "one AI mistake can't destroy the site."
- **Cache invalidation (answering the reviewer):** on publish, a **Medusa subscriber fires a
  webhook to a Next.js revalidate endpoint** that invalidates **by tag** (`revalidateTag`), so
  customers see the change in seconds. **Drafts render via a preview token that bypasses the
  ISR cache** entirely.
- A `commit_changes` Control-API tool is the explicit publish step (the owner's "one tap").

## 6. Schema validation — no hallucinated writes reach the DB

The Control API **never accepts arbitrary JSON** for theme/content. A strict **JSON-Schema
validator (e.g. Ajv)** sits in the custom module: if Claude emits a block missing required
tokens or with the wrong shape, the API **rejects it with a structured error** so Claude can
self-correct — instead of a malformed row crashing the storefront's dynamic render.

## 7. Event bus — the backbone

Everything emits **domain events** (`order.created`, `inventory.low`, `theme.published`…).
Analytics, marketing agents, CRM, warehouse, and workflows **subscribe** rather than being
wired point-to-point — so new integrations become "subscribe to an event," not surgery.
(Medusa v2 ships an event system we standardize on; Upstash backs it durably — see §14.)

## 8. Workflow engine — "when X, then Y"

Beyond one-off AI actions, owner-defined automations:

```text
order.placed → generate invoice → WhatsApp notify → Meta event → update CRM → email customer → trigger marketing agent
```

Built on Medusa v2 workflows + n8n (already in the stack). MVP ships a couple of built-in
flows (order confirmation, low-stock alert); a visual builder is later.

## 9. AI Memory — the business shouldn't forget

A **business memory store** the AI always loads as grounding: brand voice, policies, supplier
info, packaging/shipping rules, customer personas, past winning campaigns/ads, inventory
strategy. Seeded from the existing brand block in `agents/manifest.json`, then grows. Stops
every chat starting from zero.

## 10. Permissions / RBAC (including the AI) + test isolation

- **Roles** (Owner, Manager, Support, Warehouse, Designer, SEO/Marketing, **AI**) each with
  scoped permissions, enforced in the Control API. The **AI role is least-privilege** and
  **confirm-before-write** on destructive ops.
- **`is_test` flag** on orders and `ai_agent_run`: Claude's test orders are tagged and
  **filtered out of all revenue/conversion reporting**, so financials stay clean.

## 11. Asset service (off cPanel disk)

Thousands of textile images/videos/creatives/PDFs go to **object storage (S3 / Cloudflare R2)
behind a CDN**, not the cPanel filesystem. AI-generated creatives and invoices land here too.
Keeps the host light and delivery fast.

## 12. Channel Manager — one interface, many channels

Not one-off Amazon/Flipkart plugins. A single **Channel** interface (list, update price,
sync stock, fetch orders) implemented per channel: **website, Amazon, Flipkart, Instagram,
WhatsApp, future marketplaces**. `channel_listing` (data model §2h) maps catalog → channel.

## 13. Agent Registry + cost monitoring + observability

- **Agent Registry:** the 25 agents become rows with `tools`, `permissions`, `prompt`,
  `memory`, `schedule`, **`cost`** — extends the existing manifest/`ai_agent` table.
- **Cost monitoring:** track tokens + spend **per model** (Claude/Gemini/…); reuse
  `config/control.json`-style ceilings + global pause so AI spend stays bounded.
- **Observability:** log every AI action end-to-end — `prompt → tool calls → output →
  approval → execution → result` — for debugging and prompt improvement (pairs with §10 audit).

## 14. cPanel hardening (the chosen hosting path)

Owner stays on cPanel, so we design for its constraints (Review 2's risks):

| Risk | Mitigation |
|---|---|
| Passenger sleeps idle apps → cold starts | **Keep-alive cron** pinging backend + storefront every ~5 min |
| Workflow/event durability on a fragile process | **Upstash Redis (serverless, TLS) from day 1** for event bus + queues — not the in-memory fallback |
| `medusa db:migrate` vs Passenger file/db locks | Migrations run in a **maintenance step**: stop/disable the app, migrate, restart; serialize deploys |
| ModSecurity/shared firewall kills streaming/MCP | Control API is **request/response HTTP** (not dependent on long-lived SSE); MCP gateway uses standard HTTPS; **hardened auth** (rotating tokens, rate-limit, scoped) |
| Big Meta Ads history → query timeouts | **ETL/cache ad metrics** into our DB on a schedule; never live-query the ad account from a customer/Claude request |
| Postgres FTS weak on descriptive queries ("blue block-print floral zari border") | Move to **MeiliSearch (typo-tolerant)** early; semantic/vector search later |

> If the store outgrows cPanel, §6 of `05-deployment-cpanel.md` lists the no-architecture-change
> path to move the backend to a managed Node host — the Control-API core makes that a config
> change, not a rewrite.

## 15. MVP vs later (so we don't over-build)

| Build in MVP/early | Design the seam, build later |
|---|---|
| Control API core, Admin UI fallback, event bus | Multi-engine (Vendure/Saleor) adapters |
| Draft→Preview→Publish + versioning, schema validation | Multi-renderer (mobile/AMP/PDF/email) |
| Upstash + keep-alive, asset service, `is_test` | Visual workflow builder, multi-model router fan-out |
| Agent Registry (reuse 25 agents), cost/observability | Full staff RBAC UI, semantic search |

## 16. Commerce-first, OS-ready core (framing)

We **don't rename** the product — focus stays on getting the store live. But the core built
above (**Control API, event bus, workflow engine, AI memory, agent registry, permissions,
channel manager**) is deliberately **module-agnostic**, so later modules — accounting, CRM,
POS, procurement — can plug into the same core without re-architecting. Commerce is simply the
first module on an OS-ready foundation.
