The Blueprint
AI Chief of Staff Architecture
Click anywhere to begin

AI Chief of Staff — System Architecture

A pipeline-based architecture with vector search, bot memory, and multi-layer intelligence

Meeting Pipeline
Task Pipeline
Vector / Search
Memory & Learning
Intelligence
Presentation
System Overview 6 Pipelines
EXTERNAL SOURCES Meeting Transcription Calendar / EventKit Email / Messaging Health & Wellness Project Management FETCH & MERGE Multi-source sync Dedup queue Source priority merge CLASSIFY & EXTRACT LLM classification Action item extraction Pattern fallback engine SAVE & INDEX Markdown scribe files Vector embeddings Semantic indexing TASK PIPELINE Stage Dedup Validate Promote 14-day staging • 5-signal dedup • enrichment STORAGE LAYER Vector DB Task Files Meeting Files State INTELLIGENCE LAYER Signal Detection Thread Tracking Retro Analysis Cross-pipeline correlations • Early warnings • Trend detection MEMORY & LEARNING Bot Memory Correction Journal Auto-loaded Rules Session summaries • Occurrence tracking • Threshold promotion PRESENTATION LAYER Dashboard Daily Briefing Mobile Terminal TUI • Canvas UI • Telegram gateway HOOK SYSTEM — Event-driven automation layer Session Start Pre-Tool Post-Tool On Stop Memory Inject

1 Meeting Pipeline Week 1 — Foundation

Why This Exists

You attend 15+ meetings a week across multiple business domains. Without automation, that's 30+ minutes per meeting writing notes, extracting action items, and filing them. At 60+ meetings/month, you'd lose 30+ hours to post-meeting busywork — or more likely, you'd just stop doing it and lose the context entirely.

In Practice

Monday 9am: leadership call ends. Within 60 seconds, the transcript is fetched, classified as "primary business," 4 action items extracted and staged, a structured scribe saved, and the full transcript indexed. Three weeks later you search "what did we decide about the Q3 timeline?" — instant answer with source and date.

What It Does

Fetches from multiple transcript services (supporting source priority when the same meeting appears in two places), classifies by business domain using your CLAUDE.md definitions, extracts action items with quality gates, generates structured markdown scribes, and indexes everything for semantic search.

Data Flow

Transcript APIs → Merge (source priority, 85% title dedup) → LLM domain classification → Action item extraction with rejection rules → Structured markdown scribe → Vector embedding → Semantic DB

Outputs

Structured meeting files per domain (markdown with metadata table, summary, topics, transcript). Vector embeddings for semantic search. Action items routed to task pipeline staging area.

Connects To

Task pipeline (extracted action items), vector store (searchable meetings), intelligence layer (trend analysis, thread tracking), presentation (meeting counts, freshness indicators).

Build This Yourself

1 Define your business domains in CLAUDE.md (e.g., "Marketing," "Engineering," "Sales") with routing rules — which people and keywords map to which domain.
2 Connect 1-2 transcript services via API. Fireflies and Granola both have REST APIs. Fetch new transcripts on a schedule or on-demand.
3 Multi-source merge: when the same meeting exists in two services, keep the richer transcript. Dedup by title similarity (Jaccard > 0.85).
4 LLM classification: pass the transcript + your CLAUDE.md domain definitions to Claude. It returns: domain, participants, summary, and action items.
5 Scribe generator: save as structured markdown with a metadata table (date, source, domain, participants) + summary + topics + full transcript.
6 Vector indexing: chunk the transcript (~2K tokens, 200 overlap), embed with OpenAI text-embedding-3-large, upsert to local Qdrant. This enables the semantic search layer.
Transcript Service A Multi-Source Sync
Transcript Service B Dedup Queue (85% title overlap)
   LLM Classification Domain Routing (uses CLAUDE.md definitions)
   Action Extraction Task Pipeline (with quality gates)
   Scribe Generator Meeting Files (structured markdown)
   Vector Indexer Semantic DB (Qdrant, chunked + embedded)

2 Task Pipeline Week 2 — Operational

Why This Exists

Meetings generate 5-10 action items each. Without a pipeline, they live in your notes and get lost. With 60+ meetings/month, that's hundreds of potential tasks — most are noise ("I'll look into that"), some are duplicates across meetings, a few are critical. You need extraction that catches the real ones and filters the rest.

In Practice

Three separate meetings over two weeks mention "finalize the vendor contract with engineering." Without dedup, that's 3 tasks cluttering your list. The 5-signal cascade catches it: same owner + 78% token overlap = same task. You get one enriched task with all three source meetings noted and the latest deadline attached.

What It Does

LLM extracts action items from transcripts with explicit rejection rules (fragments, filler, conversational). Post-extraction validation filters noise. 5-signal dedup catches duplicates at increasing cost levels. Staging area holds tasks for 14 days before expiry. Human review promotes to committed per-domain task files.

Data Flow

LLM extraction → Noise filter (<25 chars, <4 words, bad owners) → Relevance classification (owner_action, tracking, delegated, noise) → 5-signal dedup cascade → Staging (14-day TTL) → Validation → Promotion to tasks.md

Outputs

Staged tasks (pending review with source, confidence, category). Committed tasks (per-domain markdown files). Enrichment candidates (existing tasks updated with new sources). Completion signals (detected from meeting context).

Key Insight

The extraction prompt is NOT a quality gate. LLMs will aggressively match "I'll [verb]" patterns. Three enforcement layers required: (1) prompt rejection examples, (2) post-LLM validation (length, verb, garble), (3) relevance classification with actual NOISE detection. Without all three, your task list fills with noise.

Build This Yourself

1 LLM extraction prompt: include explicit rejection examples. "Keep eye on..." = NOT a task. "We should probably..." = NOT a task. "I'll circle back" = NOT a task. Only extract: owner + verb + object + optional deadline.
2 Post-extraction filter: minimum 20 characters, 4+ words, must contain action verb + concrete object, valid owner name (check against your people directory).
3 Staging area: a JSON file where extracted tasks land with metadata (source meeting, confidence, extraction date). Set a 14-day TTL — unpromoted tasks expire to an archive.
4 Basic dedup: Jaccard token similarity after stop-word removal. >0.65 = likely duplicate. Same owner lowers threshold to 0.55. When duplicate found, merge (update context + add source) rather than drop.
5 Review workflow: CLI command or dashboard view to approve/reject staged tasks. Approved tasks append to per-domain markdown task files (e.g., marketing/tasks.md, engineering/tasks.md).
Meeting Transcript
   LLM Extraction Action Item objects (owner, task, deadline, source)
   Noise Filter: <25 chars, <4 words, no verb, bad owner REJECTED
   Relevance: owner_action | tracking | delegated | noise
   5-Signal Dedup: token similarity → owner+sim → entity co-occur → phrase match → LLM fallback
    Match found? ENRICH existing task (add source, update context, bump recency)
    No match? Staging Area (14-day TTL, awaits review)
   Validation Gates Committed Tasks (domain/tasks.md)
   14 days unpromoted EXPIRED Archive

3 Vector / Semantic Layer Week 2 — Operational

Why This Exists

Keyword search fails when you can't remember the exact words. "What did we discuss about budget concerns?" should find meetings about "financial constraints," "cost reduction," and "resource allocation." Vector embeddings search by meaning, not string matching. This is the difference between "I think we talked about this..." and knowing exactly when and what was said.

In Practice

Preparing for a VP 1:1, you search "vendor negotiations with engineering." It returns 4 meetings from the past 6 weeks — including one where timeline concerns were raised that you'd forgotten. You walk in with full context instead of winging it. Also used: "what decisions did we make about the product roadmap?" surfaces 3 meetings, 2 emails, and a decision with rationale.

What It Does

Embeds and stores meetings, memories, and corrections for semantic search and contextual retrieval. Multiple collections serve different purposes: meetings for search, memories for context injection, corrections for dedup. Each collection has its own retrieval pattern.

Data Flow

Content → Chunk (2K tokens, 200 overlap) → Embedding model (text-embedding-3-large) → Vector DB upsert with metadata → Cosine similarity search (threshold: 0.35)

Dual Retrieval Pattern

Blanket load: Behavioral rules auto-loaded every session (corrections, preferences, gotchas). Contextual search: Memory and meetings queried per-prompt via pre-tool hook. Two patterns, one infrastructure, one vector DB.

Key Design Choice

Separate collections, not one big index. Meetings (~600 docs), Session Memory (~100 docs), Corrections (~40 docs). Each has different lifecycle, retrieval pattern, and pruning rules. Meetings are searched ad-hoc; corrections are blanket-loaded; memories are injected per-prompt.

Build This Yourself — Running Semantic Search Locally

1 Qdrant (local vector DB): pip install qdrant-client. Use embedded mode — no server needed. QdrantClient(path="./qdrant_data") gives you a local file-based vector DB. Create a collection with cosine distance metric and vector size matching your embedding model (3072 for text-embedding-3-large).
2 Embeddings via OpenAI: openai.embeddings.create(model="text-embedding-3-large", input=text). Chunk your text first — 2K tokens per chunk with 200-token overlap prevents losing context at boundaries. Cost: ~$0.13 per 1M tokens (~500 meetings for ~$2-5).
3 Local embedding alternative: Use sentence-transformers locally (free, no API calls). SentenceTransformer("all-MiniLM-L6-v2") — smaller vectors (384-dim) but zero cost and no network dependency. Trade-off: slightly lower quality for complete privacy.
4 Store with metadata: Each vector gets a payload: {date, domain, participants, source, summary}. This enables filtered search: "budget meetings in Q3" = semantic match + date filter + domain filter.
5 Search function: Embed the query, cosine similarity search, minimum score threshold (0.35 is a good starting point — too high misses relevant results, too low returns noise). Return top 10 with scores.
6 Hook into Claude Code: Use a UserPromptSubmit hook to auto-search your vector DB on every prompt. The hook runs your search script, injects relevant results as context. Claude sees your past meetings/memories automatically — you don't have to search manually.

4 Memory & Learning Week 3-4 — Compounding

Why This Exists

Every AI session starts from zero. Your AI doesn't remember that the CEO prefers bullet points, that a vendor name is always misspelled in transcripts, or that you made a strategic decision three months ago and why. Bot memory means corrections, decisions, preferences, and context persist. The system gets smarter every week you use it.

In Practice

Week 1: you correct Claude — "That person's name is always garbled in transcripts. Normalize it." Logged. Week 3: same correction. The system recognizes the pattern (2 occurrences, same category), auto-promotes it to an always-loaded behavioral rule. From session 4 onward, every session applies the fix automatically. Multiply this across 40+ corrections and the system becomes deeply personalized.

What It Does

Persists three types of session knowledge: summaries (what happened), corrections (mistakes and fixes), decisions (choices with rationale and alternatives). Tracks correction frequency. When a pattern recurs 2-3 times, auto-promotes to an always-loaded rules file that Claude reads every session.

Correction Lifecycle

User corrects AI → Pattern journal (occurrence tracking with category, date, context) → Semantic dedup check against existing corrections (prevents duplicate rules) → Threshold met (2-3 in same category) → Auto-promote to rules file → Loaded every future session.

Memory Types

Sessions: What happened, what was decided, what changed. Corrections: Mistake + fix + occurrence count + category. Decisions: Choice made + reasoning + alternatives considered + source context. Each stored as a vector embedding for semantic retrieval.

Pruning & Lifecycle

Memories older than 180 days with low access counts are pruned. Promoted corrections are exempt (they live in the rules file forever). Backup before bulk operations — session memories stored as embeddings are irreplaceable. The rules file is version-controlled, but vector memories have no rebuild path.

Build This Yourself — The Hooks That Make It Work

1 UserPromptSubmit hook: Runs before every user message reaches Claude. Searches your memory vector collection for relevant context. Injects matching memories (past corrections, decisions, summaries) directly into the conversation. Claude sees your history without you having to mention it.
2 Stop hook (session end): When a session ends, summarizes the conversation and stores it as a memory embedding. Extracts any corrections made during the session and logs them to the correction journal with occurrence tracking.
3 Correction journal (JSON): Track each correction with: pattern, category, occurrence count, dates, context. When count hits threshold (2-3 by category), auto-append to your rules file (e.g., gotchas.md or CLAUDE.md). This file is auto-loaded every session by Claude Code.
4 Claude Code hooks location: Hooks go in .claude/hooks/ — they're shell commands that fire on events. UserPromptSubmit for memory injection, Stop for session saving. The hook script can be Python that queries Qdrant and returns context.

5 Intelligence Layer Week 3-4 — Compounding

Why This Exists

Individual data points are noise. Intelligence is the pattern across them. "Haven't met with VP Engineering in 3 weeks" + "4 stalled tasks in that domain" + "board presentation in 2 weeks" = early warning. No human has time to manually cross-reference meeting frequency, task velocity, relationship recency, and calendar density. But a system that already has all this data can surface the pattern automatically.

In Practice

Signal detector fires: "Relationship staleness — VP Engineering, 21 days since last meeting, 3 open items in their domain." Thread tracker shows "infrastructure migration" thread velocity dropped to zero. The daily briefing correlates them: "Suggested action: schedule catch-up before next board review." You didn't connect those dots — the system did.

Three Intelligence Engines

Signal detection: algorithmic anomaly detection (no LLM needed). Frequency changes, velocity drops, staleness, domain drift. Thread tracking: auto-clusters topics across meetings using embeddings, tracks velocity and resolution. Retro analysis: weekly friction journal — completion rates, deferred tasks, domain allocation drift.

Signal Categories

Frequency anomalies (sudden changes in meeting cadence), task velocity drops (domain going stale), relationship staleness (no contact in N days), domain allocation drift (time spend vs priority), topic escalation (recurring unresolved topics), system health (cache freshness, sync failures).

Thread Tracking

Auto-detected threads: topic clustering via embedding similarity across meetings. Each thread tracks: first/last mention, velocity (mentions per week), participating people, related tasks. Resolution detection via token matching against meeting transcripts. Convergence detection when two threads start overlapping.

Cross-Pipeline Correlations

The real value: correlating across pipelines. Stalled threads + matching task signals = blocked project. Pending items + upcoming calendar = meeting prep priority. Health metrics + schedule density = burnout risk. Task completions + signal resolution = project momentum. This is the layer that makes the system proactive, not reactive.

6 Presentation Layer Week 5+ — Autonomous

Why This Exists

All the intelligence in the world is useless if you can't access it quickly. A 30-second dashboard check before meetings beats 10 minutes of manual prep. Mobile access means you're never caught without context — walking to a meeting, working from your phone, or on the road.

In Practice

Walking to a leadership meeting, you message the mobile bot: "prep [executive name]." Instant briefing: open tasks in their domain, last 3 meeting summaries, recent decisions, and suggested talking points. Post-meeting: "sync" triggers transcript fetch, classification, and indexing — all before you reach your desk. The dashboard shows freshness of every data source so you know what's current.

Multi-Interface Architecture

CLI Dashboard: Primary interface. Runs at session start via hook. Shows task counts, calendar context, staging queue, signal alerts, cache freshness. Canvas TUI: Rich visualizations — task boards, timeline views, health dashboards. Daily Briefing: Comprehensive JSON output with all cross-pipeline intelligence. Mobile Gateway: Telegram bot for on-the-go access.

Dashboard Sources

Task files (all domains) → urgent/overdue counts. Calendar (EventKit) → current/next meeting with prep context. Staging area → pending review queue. Signal cache → active alerts. Sync state → data freshness indicators. Bot memory → recent session context.

Hook-Driven Updates

Real-time via event system: Session start triggers dashboard + memory injection. Pre-tool updates TUI visibility. Post-tool syncs task changes to Canvas. Response stop saves transcript summary + session memory. The hooks are the nervous system — they make everything reactive without manual triggers.

Mobile Commands

Natural language via Telegram: "status" = dashboard summary, "search [topic]" = semantic search, "person [name]" = profile lookup, "prep [name]" = meeting briefing, "tasks [domain]" = domain tasks, "complete [id]" = mark done. Unrecognized text defaults to semantic search. <1 second response time via long-polling.

Shared Components Cross-Pipeline

Vector Database

Why: You need to search 600+ meetings, 100+ session memories, and 40+ corrections by meaning — not just keywords. A local embedded vector DB (Qdrant) runs with no server, stores as files, and supports multiple collections with different retrieval patterns. Exclusive file locking prevents concurrent access corruption.

MeetingTaskMemoryIntel

Embedding Service

Why: Semantic similarity requires converting text to high-dimensional vectors. OpenAI's text-embedding-3-large (3072 dimensions) provides the best quality; local alternatives like sentence-transformers (384-dim) trade quality for privacy and zero cost. Used everywhere: meeting indexing, memory storage, topic clustering, correction dedup, thread detection.

MeetingMemoryIntel

People Directory

Why: The system needs to know who people are to route correctly. "Chris mentioned budget concerns" — but which Chris? A Python file with structured profiles (name, role, domain, engagement style) resolves ambiguity, validates task owners, enriches meeting classifications, and generates personalized briefings. The directory IS the context.

MeetingTaskIntelPresent

LLM Processor

Why: Different tasks need different models. Meeting classification needs a capable model (Sonnet-class). Thread naming can use a fast model (Haiku-class). Strategic judgment needs the top model (Opus-class) in-session with full context. A centralized config maps task types to models — when a better model launches, one change upgrades everything.

MeetingTaskIntel

Hook System

Why: Hooks are the nervous system. Without them, you'd manually run the dashboard, manually search memory, manually save session context. Claude Code hooks fire on events: SessionStart (dashboard + context), UserPromptSubmit (memory injection), PreTool/PostTool (TUI sync), Stop (session summary). This makes the system reactive without manual triggers.

MemoryPresent

Cache Health

Why: Stale data leads to wrong decisions. Every cache has a TTL (1-4 hours for ephemeral, version-controlled for accumulated, manual backup for irreplaceable). Staleness propagates through the dependency graph: if your meeting sync is 3 days stale, your signal detection and thread tracking are also stale. The dashboard shows freshness so you know what to trust.

IntelPresent

Design Principles — The Rules That Shape the System

01
Scripts gather, AI decides. Scripts extract, flag, and structure raw data. All judgment calls happen in-session with full context. No script-level AI should make decisions — only detection.
02
Skills over script API calls. When AI judgment is needed, build a skill that uses the top model in-session with parallel sub-agents. Scripts do mechanical pre-filtering; skills do high-fidelity review.
03
Cross-source intelligence. The value is connecting dots you don't have time to connect. New email → related meetings, open tasks. Meeting transcript → emails awaiting response, blocked tasks.
04
Staging over direct writes. All automatically extracted data flows through a staging area before committing. Staging enables review, prevents noise, and supports enrichment.
05
Dedup = merge, not drop. When a duplicate is detected, check for new information. Update the existing item rather than silently dropping. New info enriches; it doesn't discard.
06
Bouncer mindset. Before approving any change, stress-test against current output. Does it maintain or improve fidelity? Proactively block changes that would regress quality.
07
Decisions carry rationale. Every stored decision includes reasoning, alternatives considered, and source context. Future sessions can revisit decisions intelligently.
08
Every disconnected tool is a blind spot. The system's value comes from unified context. Each integration closes a specific gap that would otherwise require manual correlation.
09
No cache without a TTL. Every derived or cached data store has an explicit freshness policy. Stale data leads to wrong decisions. Staleness signals propagate through the dependency graph.
10
Separate stores for separate data types. Meetings, memories, and corrections share infrastructure but never cross-write. Each has its own lifecycle, retrieval pattern, and pruning rules.

Pattern Catalog — Reusable Solutions Across Pipelines

5-Signal Dedup Cascade

Multi-layer deduplication that catches duplicates at increasing cost. Each signal is mechanical except the final LLM fallback — keeping costs low while catching edge cases.

1Token similarityJaccard with stop-word filtering
2Owner + similaritySame person lowers threshold
3Entity co-occurrenceShared named entities + similarity
4Key phrase overlap4+ word sequence matching
5LLM semantic fallbackAI judgment on borderline cases

Quality Gate Stack

Four-layer validation prevents noise from entering the system. Each layer catches different failure modes — from garbled transcription to conversational filler.

1Extraction promptExplicit rejection examples
2Post-LLM validationLength, verb, garble, filler checks
3Staging noise filterRelevance classification
4Promotion validationRe-check dedup + domain routing

Enrichment Over Discard

When new information arrives about an existing item, merge it in rather than dropping the duplicate. Preserves provenance and builds richer context over time.

New source detected"Also discussed in [meeting]"
Deadline added"Due: [date] from [source]"
Context expanded"Update: [detail] from [source]"
Status changed"Status update: [detail] [date]"

Correction Compounding

Mistakes become permanent learning. The system tracks correction frequency and automatically promotes recurring patterns to always-loaded behavioral rules.

1Observe correctionUser corrects AI behavior
2Journal entryTrack pattern + occurrences
3Threshold check2-3 occurrences by category
4Auto-promoteAppend to auto-loaded rules
Data Freshness & Cache Tiering Design Rule

Every cached or derived data store has an explicit freshness policy. Stale data leads to wrong decisions.

Tier 1: Ephemeral

API-fetched, cheap to rebuild. Calendar data, health metrics, email snapshots, daemon logs. Short TTLs (1-4 hours). Safe to delete — rebuilds automatically.

Tier 2: Accumulated

Built over time, costly to rebuild. Session logs, correction journals, meeting insights, completion history. Version controlled. Losing this data means losing institutional memory.

Tier 3: Irreplaceable

Vector data with no rebuild path. Session memories (corrections, decisions, context) stored as embeddings. Requires manual backup. Loss is permanent — months of learning gone.

Graduated Autonomy — Permission Tiers

Not all actions are created equal. The system operates across five permission tiers, from always-allowed reads to confirmation-required creates.

READ
Search, query, check
Always allowed
DRAFT
Generate, prepare, stage
Always allowed
NOTIFY
Push alerts, surface signals
Auto with audit trail
SEND
Email, message, post
Requires confirmation
CREATE
Calendar, commits, files
Requires confirmation