← Pharma Intelligence Copilot

How It Works

Two-corpus RAG with a live Box integration. Here's what's actually happening under the hood.

The pipeline

01

Scrape FDA enforcement corpus

870+ CDER/CBER warning letters (2019–present) scraped from FDA.gov, categorized into 10 violation areas using Claude, chunked, and embedded into Pinecone.

02

Connect to Box via JWT

Internal quality documents live in a Box folder. The server-to-server JWT connector downloads files on demand — no migration, no export.

03

Embed internal documents

Each document is chunked with section context preserved and embedded into a separate Pinecone namespace. Box webhooks trigger automatic re-ingestion when files change.

04

Cross-corpus retrieval

For each violation category, semantic search runs against both corpora in parallel — retrieving the most relevant warning letter passages and internal document sections.

05

Risk signal generation

Claude analyzes the enforcement patterns and document evidence to produce a structured signal: enforcement frequency, document coverage assessment, and a specific review prompt for the team.

06

Stream results in real time

Signals appear as they complete — 10 categories processed in parallel batches, streamed via SSE so users see results progressively rather than waiting for the full scan.

Architecture

Corpus 1
FDA Warning Letters
870+ letters · CDER/CBER · 2019–2025
Pinecone namespace: fda-warning-letters
Corpus 2
Internal Quality Docs
Box JWT · 8 SOPs & policies
Pinecone namespace: internal-docs
Engine
Cross-Corpus Analysis
Claude Sonnet · 10 categories
Parallel batches · SSE streaming

Who uses it and why

Role
VP Quality
Need
Know what FDA is currently focused on before an inspection
Value
Trend Q&A over real enforcement data, not analyst summaries
Role
QA Director
Need
Identify procedure gaps relative to enforcement patterns
Value
Cross-corpus scan maps your SOPs to active citation areas
Role
Regulatory Affairs
Need
Understand the enforcement context for a filing decision
Value
Ask specific questions: 'What FDA language appears around stability trend analysis?'

Same engine. Different use cases.

The two-corpus RAG architecture adapts to any domain where external reference data needs to be cross-referenced against internal documents.

Pharma IntelligenceCompliance CopilotRules Expert
Document corpusFDA warning letters + quality SOPs21 CFR Part 11 + policy documents2023 Rules of Golf
RetrievalTwo-namespace cross-corpusSingle-namespace requirement matchingHybrid vector + BM25
OutputRisk signals with coverage assessmentGap analysis with requirement statusCited rule answers
External integrationBox JWT connectorStatic document uploadNone
StreamingSSE (signal-by-signal)SSE (requirement-by-requirement)UI message stream

Ready to explore?

Start with enforcement trends or run a full risk scan against Meridian's documents.