Get the interactive blueprint — the same system architecture powering a real AI Chief of Staff.
A pipeline-based architecture with vector search, bot memory, and multi-layer intelligence
You attend 15+ meetings a week across multiple business domains. Without automation, that's 30+ minutes per meeting writing notes, extracting action items, and filing them. At 60+ meetings/month, you'd lose 30+ hours to post-meeting busywork — or more likely, you'd just stop doing it and lose the context entirely.
Monday 9am: leadership call ends. Within 60 seconds, the transcript is fetched, classified as "primary business," 4 action items extracted and staged, a structured scribe saved, and the full transcript indexed. Three weeks later you search "what did we decide about the Q3 timeline?" — instant answer with source and date.
Fetches from multiple transcript services (supporting source priority when the same meeting appears in two places), classifies by business domain using your CLAUDE.md definitions, extracts action items with quality gates, generates structured markdown scribes, and indexes everything for semantic search.
Transcript APIs → Merge (source priority, 85% title dedup) → LLM domain classification → Action item extraction with rejection rules → Structured markdown scribe → Vector embedding → Semantic DB
Structured meeting files per domain (markdown with metadata table, summary, topics, transcript). Vector embeddings for semantic search. Action items routed to task pipeline staging area.
Task pipeline (extracted action items), vector store (searchable meetings), intelligence layer (trend analysis, thread tracking), presentation (meeting counts, freshness indicators).
Meetings generate 5-10 action items each. Without a pipeline, they live in your notes and get lost. With 60+ meetings/month, that's hundreds of potential tasks — most are noise ("I'll look into that"), some are duplicates across meetings, a few are critical. You need extraction that catches the real ones and filters the rest.
Three separate meetings over two weeks mention "finalize the vendor contract with engineering." Without dedup, that's 3 tasks cluttering your list. The 5-signal cascade catches it: same owner + 78% token overlap = same task. You get one enriched task with all three source meetings noted and the latest deadline attached.
LLM extracts action items from transcripts with explicit rejection rules (fragments, filler, conversational). Post-extraction validation filters noise. 5-signal dedup catches duplicates at increasing cost levels. Staging area holds tasks for 14 days before expiry. Human review promotes to committed per-domain task files.
LLM extraction → Noise filter (<25 chars, <4 words, bad owners) → Relevance classification (owner_action, tracking, delegated, noise) → 5-signal dedup cascade → Staging (14-day TTL) → Validation → Promotion to tasks.md
Staged tasks (pending review with source, confidence, category). Committed tasks (per-domain markdown files). Enrichment candidates (existing tasks updated with new sources). Completion signals (detected from meeting context).
The extraction prompt is NOT a quality gate. LLMs will aggressively match "I'll [verb]" patterns. Three enforcement layers required: (1) prompt rejection examples, (2) post-LLM validation (length, verb, garble), (3) relevance classification with actual NOISE detection. Without all three, your task list fills with noise.
Keyword search fails when you can't remember the exact words. "What did we discuss about budget concerns?" should find meetings about "financial constraints," "cost reduction," and "resource allocation." Vector embeddings search by meaning, not string matching. This is the difference between "I think we talked about this..." and knowing exactly when and what was said.
Preparing for a VP 1:1, you search "vendor negotiations with engineering." It returns 4 meetings from the past 6 weeks — including one where timeline concerns were raised that you'd forgotten. You walk in with full context instead of winging it. Also used: "what decisions did we make about the product roadmap?" surfaces 3 meetings, 2 emails, and a decision with rationale.
Embeds and stores meetings, memories, and corrections for semantic search and contextual retrieval. Multiple collections serve different purposes: meetings for search, memories for context injection, corrections for dedup. Each collection has its own retrieval pattern.
Content → Chunk (2K tokens, 200 overlap) → Embedding model (text-embedding-3-large) → Vector DB upsert with metadata → Cosine similarity search (threshold: 0.35)
Blanket load: Behavioral rules auto-loaded every session (corrections, preferences, gotchas). Contextual search: Memory and meetings queried per-prompt via pre-tool hook. Two patterns, one infrastructure, one vector DB.
Separate collections, not one big index. Meetings (~600 docs), Session Memory (~100 docs), Corrections (~40 docs). Each has different lifecycle, retrieval pattern, and pruning rules. Meetings are searched ad-hoc; corrections are blanket-loaded; memories are injected per-prompt.
QdrantClient(path="./qdrant_data") gives you a local file-based vector DB. Create a collection with cosine distance metric and vector size matching your embedding model (3072 for text-embedding-3-large).openai.embeddings.create(model="text-embedding-3-large", input=text). Chunk your text first — 2K tokens per chunk with 200-token overlap prevents losing context at boundaries. Cost: ~$0.13 per 1M tokens (~500 meetings for ~$2-5).SentenceTransformer("all-MiniLM-L6-v2") — smaller vectors (384-dim) but zero cost and no network dependency. Trade-off: slightly lower quality for complete privacy.Every AI session starts from zero. Your AI doesn't remember that the CEO prefers bullet points, that a vendor name is always misspelled in transcripts, or that you made a strategic decision three months ago and why. Bot memory means corrections, decisions, preferences, and context persist. The system gets smarter every week you use it.
Week 1: you correct Claude — "That person's name is always garbled in transcripts. Normalize it." Logged. Week 3: same correction. The system recognizes the pattern (2 occurrences, same category), auto-promotes it to an always-loaded behavioral rule. From session 4 onward, every session applies the fix automatically. Multiply this across 40+ corrections and the system becomes deeply personalized.
Persists three types of session knowledge: summaries (what happened), corrections (mistakes and fixes), decisions (choices with rationale and alternatives). Tracks correction frequency. When a pattern recurs 2-3 times, auto-promotes to an always-loaded rules file that Claude reads every session.
User corrects AI → Pattern journal (occurrence tracking with category, date, context) → Semantic dedup check against existing corrections (prevents duplicate rules) → Threshold met (2-3 in same category) → Auto-promote to rules file → Loaded every future session.
Sessions: What happened, what was decided, what changed. Corrections: Mistake + fix + occurrence count + category. Decisions: Choice made + reasoning + alternatives considered + source context. Each stored as a vector embedding for semantic retrieval.
Memories older than 180 days with low access counts are pruned. Promoted corrections are exempt (they live in the rules file forever). Backup before bulk operations — session memories stored as embeddings are irreplaceable. The rules file is version-controlled, but vector memories have no rebuild path.
UserPromptSubmit for memory injection, Stop for session saving. The hook script can be Python that queries Qdrant and returns context.Individual data points are noise. Intelligence is the pattern across them. "Haven't met with VP Engineering in 3 weeks" + "4 stalled tasks in that domain" + "board presentation in 2 weeks" = early warning. No human has time to manually cross-reference meeting frequency, task velocity, relationship recency, and calendar density. But a system that already has all this data can surface the pattern automatically.
Signal detector fires: "Relationship staleness — VP Engineering, 21 days since last meeting, 3 open items in their domain." Thread tracker shows "infrastructure migration" thread velocity dropped to zero. The daily briefing correlates them: "Suggested action: schedule catch-up before next board review." You didn't connect those dots — the system did.
Signal detection: algorithmic anomaly detection (no LLM needed). Frequency changes, velocity drops, staleness, domain drift. Thread tracking: auto-clusters topics across meetings using embeddings, tracks velocity and resolution. Retro analysis: weekly friction journal — completion rates, deferred tasks, domain allocation drift.
Frequency anomalies (sudden changes in meeting cadence), task velocity drops (domain going stale), relationship staleness (no contact in N days), domain allocation drift (time spend vs priority), topic escalation (recurring unresolved topics), system health (cache freshness, sync failures).
Auto-detected threads: topic clustering via embedding similarity across meetings. Each thread tracks: first/last mention, velocity (mentions per week), participating people, related tasks. Resolution detection via token matching against meeting transcripts. Convergence detection when two threads start overlapping.
The real value: correlating across pipelines. Stalled threads + matching task signals = blocked project. Pending items + upcoming calendar = meeting prep priority. Health metrics + schedule density = burnout risk. Task completions + signal resolution = project momentum. This is the layer that makes the system proactive, not reactive.
All the intelligence in the world is useless if you can't access it quickly. A 30-second dashboard check before meetings beats 10 minutes of manual prep. Mobile access means you're never caught without context — walking to a meeting, working from your phone, or on the road.
Walking to a leadership meeting, you message the mobile bot: "prep [executive name]." Instant briefing: open tasks in their domain, last 3 meeting summaries, recent decisions, and suggested talking points. Post-meeting: "sync" triggers transcript fetch, classification, and indexing — all before you reach your desk. The dashboard shows freshness of every data source so you know what's current.
CLI Dashboard: Primary interface. Runs at session start via hook. Shows task counts, calendar context, staging queue, signal alerts, cache freshness. Canvas TUI: Rich visualizations — task boards, timeline views, health dashboards. Daily Briefing: Comprehensive JSON output with all cross-pipeline intelligence. Mobile Gateway: Telegram bot for on-the-go access.
Task files (all domains) → urgent/overdue counts. Calendar (EventKit) → current/next meeting with prep context. Staging area → pending review queue. Signal cache → active alerts. Sync state → data freshness indicators. Bot memory → recent session context.
Real-time via event system: Session start triggers dashboard + memory injection. Pre-tool updates TUI visibility. Post-tool syncs task changes to Canvas. Response stop saves transcript summary + session memory. The hooks are the nervous system — they make everything reactive without manual triggers.
Natural language via Telegram: "status" = dashboard summary, "search [topic]" = semantic search, "person [name]" = profile lookup, "prep [name]" = meeting briefing, "tasks [domain]" = domain tasks, "complete [id]" = mark done. Unrecognized text defaults to semantic search. <1 second response time via long-polling.
Why: You need to search 600+ meetings, 100+ session memories, and 40+ corrections by meaning — not just keywords. A local embedded vector DB (Qdrant) runs with no server, stores as files, and supports multiple collections with different retrieval patterns. Exclusive file locking prevents concurrent access corruption.
Why: Semantic similarity requires converting text to high-dimensional vectors. OpenAI's text-embedding-3-large (3072 dimensions) provides the best quality; local alternatives like sentence-transformers (384-dim) trade quality for privacy and zero cost. Used everywhere: meeting indexing, memory storage, topic clustering, correction dedup, thread detection.
Why: The system needs to know who people are to route correctly. "Chris mentioned budget concerns" — but which Chris? A Python file with structured profiles (name, role, domain, engagement style) resolves ambiguity, validates task owners, enriches meeting classifications, and generates personalized briefings. The directory IS the context.
Why: Different tasks need different models. Meeting classification needs a capable model (Sonnet-class). Thread naming can use a fast model (Haiku-class). Strategic judgment needs the top model (Opus-class) in-session with full context. A centralized config maps task types to models — when a better model launches, one change upgrades everything.
Why: Hooks are the nervous system. Without them, you'd manually run the dashboard, manually search memory, manually save session context. Claude Code hooks fire on events: SessionStart (dashboard + context), UserPromptSubmit (memory injection), PreTool/PostTool (TUI sync), Stop (session summary). This makes the system reactive without manual triggers.
Why: Stale data leads to wrong decisions. Every cache has a TTL (1-4 hours for ephemeral, version-controlled for accumulated, manual backup for irreplaceable). Staleness propagates through the dependency graph: if your meeting sync is 3 days stale, your signal detection and thread tracking are also stale. The dashboard shows freshness so you know what to trust.
Multi-layer deduplication that catches duplicates at increasing cost. Each signal is mechanical except the final LLM fallback — keeping costs low while catching edge cases.
Four-layer validation prevents noise from entering the system. Each layer catches different failure modes — from garbled transcription to conversational filler.
When new information arrives about an existing item, merge it in rather than dropping the duplicate. Preserves provenance and builds richer context over time.
Mistakes become permanent learning. The system tracks correction frequency and automatically promotes recurring patterns to always-loaded behavioral rules.
Every cached or derived data store has an explicit freshness policy. Stale data leads to wrong decisions.
API-fetched, cheap to rebuild. Calendar data, health metrics, email snapshots, daemon logs. Short TTLs (1-4 hours). Safe to delete — rebuilds automatically.
Built over time, costly to rebuild. Session logs, correction journals, meeting insights, completion history. Version controlled. Losing this data means losing institutional memory.
Vector data with no rebuild path. Session memories (corrections, decisions, context) stored as embeddings. Requires manual backup. Loss is permanent — months of learning gone.
Not all actions are created equal. The system operates across five permission tiers, from always-allowed reads to confirmation-required creates.