Skip to main content

Fastpaca Context Store

Context budgeting and compaction for LLM apps. Keep long conversations fast and affordable.

  • Set token budgets. Conversations stay within bounds.
  • You control the accuracy/cost tradeoff.
                      ╔═ fastpaca ════════════════════════╗
╔══════════╗ ║ ║░ ╔═optional═╗
║ ║░ ║ ┏━━━━━━━━━━━┓ ┏━━━━━━━━━━━┓ ║░ ║ ║░
║ client ║░───API──▶║ ┃ Message ┃────▶┃ Context ┃ ║░ ──▶║ postgres ║░
║ ║░ ║ ┃ History ┃ ┃ Policy ┃ ║░ ║ ║░
╚══════════╝░ ║ ┗━━━━━━━━━━━┛ ┗━━━━━━━━━━━┛ ║░ ╚══════════╝░
░░░░░░░░░░░░ ║ ║░ ░░░░░░░░░░░░
╚═══════════════════════════════════╝░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Enforces a per-conversation token budget before requests hit your LLM.

Long conversations get expensive and slow

  • More messages = more tokens = higher cost
  • Larger context = slower responses
  • Eventually you hit the model's limit

What Fastpaca Context Store does

Enforces per-conversation token budgets with deterministic compaction.

  • Keep full history for users
  • Compact context for the model
  • Choose your policy (last_n, skip_parts, manual)

Quick Start

const fastpaca = createClient({ baseUrl: 'http://localhost:4000/v1' });
const ctx = await fastpaca.context('demo', { budget: 1_000_000 });
await ctx.append({ role: 'user', parts: [{ type: 'text', text: 'Hi' }] });
const { messages } = await ctx.context();

Background

We kept rebuilding the same Redis + Postgres + pub/sub stack to manage conversation state and compaction. It was messy, hard to scale, and expensive to tune. Fastpaca Context Store turns that pattern into a single service you can drop in.