Skip to main content

Technology Stack

This page documents every technology in the Layers appview stack, including version pins, roles, and selection rationale. The stack follows Chive's production architecture closely, extending it where Layers' 26 record types, discriminated annotation model, and dense cross-referencing require additional infrastructure.

Runtime and Language

TechnologyVersionRole
Node.js22+ (LTS)Application runtime
TypeScript5.9+Primary language
pnpm10+Package manager and monorepo workspace management

Node.js 22+ is the current LTS line. It provides native ESM support without transpilation flags, a stable fetch implementation, and performance improvements in the V8 engine (Maglev compiler, resizable ArrayBuffers) that benefit high-throughput firehose processing.

TypeScript 5.9+ is configured in strict mode with experimentalDecorators enabled for dependency injection via tsyringe. While TC39 stage 3 decorators are now available natively in TypeScript 5.9+, tsyringe still requires the legacy experimentalDecorators flag. This is the same trade-off Chive makes. Strict mode catches null/undefined errors at compile time across all 26 record type handlers. Decorator support enables constructor-based DI without manual wiring:

@injectable()
export class ExpressionIndexer {
constructor(
@inject('PostgresClient') private pg: PostgresClient,
@inject('ElasticClient') private es: ElasticClient,
@inject('Neo4jClient') private neo4j: Neo4jClient,
) {}
}

pnpm 10+ manages the monorepo workspace. Its content-addressable store deduplicates shared dependencies across packages (api, workers, shared, cli), and pnpm-workspace.yaml defines package boundaries. Strict dependency resolution prevents phantom dependencies that could cause runtime failures in production.

API Framework

TechnologyVersionRole
Hono4+HTTP framework
Zod4+Runtime validation and TypeScript type inference
@hono/zod-openapilatestOpenAPI 3.1 generation from Zod schemas

Hono 4+ serves as the HTTP framework for both XRPC and REST endpoints. It was selected for its benchmark performance (consistently fastest in Node.js HTTP framework comparisons), minimal footprint, and native middleware composition. The middleware stack follows Chive's 7-layer ordering:

const app = new Hono()
.use('*', secureHeaders()) // 1. Security headers (HSTS, X-Frame, CSP)
.use('*', cors(corsConfig)) // 2. CORS handling
.use('*', serviceInjection()) // 3. Inject services into context (tsyringe)
.use('*', requestContext()) // 4. Request ID, timing, child logger
.use('*', authenticate()) // 5. OAuth token / service auth JWT verification
.use('*', rateLimiter()) // 6. Tiered rate limiting (Redis sorted sets)
.onError(errorHandler) // 7. Structured error responses

The appview exposes a dual XRPC + REST API surface:

  • XRPC endpoints implement the ATProto-native query interface. Clients using @atproto/api interact with these directly. All 38+ query endpoints defined in the API Design page are served here.
  • REST endpoints provide search, composite queries, and convenience routes for web clients and third-party integrations that do not use the ATProto SDK.

Zod 4+ provides runtime validation with automatic TypeScript type inference. Zod 4 delivers 2-3x performance improvement over Zod 3 in benchmarks, which matters for high-throughput firehose validation. Every request body, query parameter, and response payload is defined as a Zod schema. Invalid requests are rejected before reaching handler logic:

const SearchParams = z.object({
q: z.string().min(1).max(500),
kind: z.enum(['token', 'span', 'relation', 'sentence']).optional(),
limit: z.number().int().min(1).max(100).default(25),
cursor: z.string().optional(),
})

type SearchParams = z.infer<typeof SearchParams>

@hono/zod-openapi generates an OpenAPI 3.1 specification directly from Zod schemas registered on Hono routes. The generated spec powers interactive API documentation and client SDK generation without maintaining a separate schema file.

ATProto Integration

TechnologyVersionRole
@atproto/apilatestProtocol SDK for ATProto operations
@atproto/identitylatestDID resolution (did:plc, did:web)
@atproto/lexiconlatestSchema parsing and validation for pub.layers.* lexicons
@atproto/xrpc-serverlatestXRPC server implementation
@atproto/oauth-client-nodelatestOAuth 2.0 + PKCE authentication flow

These packages are maintained by Bluesky PBC and provide the canonical implementation of ATProto primitives. The appview uses them as follows:

  • @atproto/api provides the Agent class for making authenticated requests to PDSes and relays, the AtUri class for parsing and constructing AT-URIs, and typed interfaces for standard ATProto operations.
  • @atproto/identity resolves DIDs to DID documents, extracting PDS endpoints and signing keys. The appview resolves DIDs during firehose ingestion (to validate record authorship) and during API requests (to link records to user identities).
  • @atproto/lexicon parses the 26 pub.layers.* lexicon JSON files and validates incoming records against their schemas during firehose ingestion. Records that fail validation are routed to the dead letter queue. See Firehose Ingestion for the validation pipeline.
  • @atproto/xrpc-server provides the XRPC route registration, method handler signature, and error response formatting that the ATProto ecosystem expects.
  • @atproto/oauth-client-node implements the OAuth 2.0 + PKCE flow for user authentication. See Authentication for the full auth flow.

Error Handling

TechnologyVersionRole
Custom Result<T, E>Typed error handling monad
Custom LayersError hierarchyStructured error classification

Following Chive's src/types/result.ts pattern, all fallible operations return a Result<T, E> monad instead of throwing exceptions. This provides explicit, type-safe error handling at every call site:

// src/types/result.ts
type Result<T, E = LayersError> =
| { ok: true; value: T }
| { ok: false; error: E }

function Ok<T>(value: T): Result<T, never> { return { ok: true, value } }
function Err<E>(error: E): Result<never, E> { return { ok: false, error } }

// Combinators: isOk(), isErr(), unwrap(), unwrapOr(), map(), mapErr(), andThen()

The LayersError hierarchy (in src/types/errors.ts) mirrors Chive's ChiveError:

Error ClassHTTP StatusUse Case
ComplianceErrorATProto violations (write to PDS, blob storage, non-rebuildable state)
NotFoundError404Record not found
ValidationError400Invalid input (Zod failures, business rules)
AuthenticationError401Missing or invalid credentials
AuthorizationError403Insufficient permissions
RateLimitError429Rate limit exceeded (includes retryAfter)
DatabaseError500Storage backend failures
ServiceUnavailableError503Downstream service unreachable
PluginError500Plugin execution failure
SandboxViolationError500Plugin exceeded resource limits or attempted forbidden operation
// Usage in handlers — no throws, explicit error paths
async function getExpression(uri: string): Promise<Result<Expression>> {
const record = await pgPolicy.execute(() => pg.query(sql, [uri]))
if (!record.rows[0]) return Err(new NotFoundError(`Expression not found: ${uri}`))
return Ok(record.rows[0])
}

Databases

The appview uses four databases, each serving a distinct query pattern. PostgreSQL is the source of truth for all record types. Elasticsearch, Neo4j, and Redis are derived indexes that can be rebuilt from PostgreSQL or the firehose at any time. See Database Design for schemas, mappings, and data models.

PostgreSQL 16+

AttributeDetail
RoleSource of truth for all 26 record types
Key featuresAT-URI foreign keys, JSONB for flexible fields, GIN indexes, partial indexes

PostgreSQL stores every pub.layers.* record as a normalized row with AT-URI foreign keys for cross-references. JSONB columns store flexible fields (annotation feature maps, experiment parameters, knowledge references) that vary by record type without requiring schema migrations. GIN indexes on JSONB columns enable efficient containment queries (@> operator) for filtering by nested properties.

The schema uses AT-URIs as the primary foreign key type, following Chive's pattern:

CREATE TABLE expression (
uri TEXT PRIMARY KEY, -- at://did:plc:xxx/pub.layers.expression.expression/rkey
did TEXT NOT NULL,
rkey TEXT NOT NULL,
text TEXT,
granularity TEXT NOT NULL,
parent_uri TEXT REFERENCES expression(uri),
eprint_ref TEXT,
metadata JSONB,
indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (did, rkey)
);

CREATE INDEX idx_expression_granularity ON expression(granularity);
CREATE INDEX idx_expression_parent ON expression(parent_uri) WHERE parent_uri IS NOT NULL;
CREATE INDEX idx_expression_metadata ON expression USING GIN (metadata);

Elasticsearch 8+

AttributeDetail
RoleFull-text search, faceted queries, completion suggesters
Key featuresCustom linguistic analyzers, nested objects, faceted aggregations

Elasticsearch indexes a subset of record types that require full-text search or faceted filtering (see the coverage matrix on the Overview page). Custom analyzers handle linguistic data: language-specific stemmers, ICU tokenization for CJK text, phonetic analysis for lexical resources, and n-gram tokenization for partial matching.

Annotation layers are indexed as nested objects in Elasticsearch, enabling faceted search across the three-dimensional kind/subkind/formalism space:

{
"mappings": {
"properties": {
"annotations": {
"type": "nested",
"properties": {
"kind": { "type": "keyword" },
"subkind": { "type": "keyword" },
"formalism": { "type": "keyword" },
"label": { "type": "text", "fields": { "raw": { "type": "keyword" } } }
}
}
}
}
}

Neo4j 5+

AttributeDetail
RoleKnowledge graph, cross-reference traversal, path queries
Key featuresNative graph storage, Cypher query language, APOC library

Neo4j stores the knowledge graph built from graph.graphNode and graph.graphEdge records, corpus membership edges, type hierarchy edges, alignment links, and cross-reference relationships. It answers queries that require multi-hop traversal: "find all annotations that reference an entity grounded in this Wikidata entry" or "find all corpora that contain expressions linked to this eprint."

Neo4j was chosen over PostgreSQL recursive CTEs because graph traversal performance degrades significantly with depth in relational databases (each hop adds a self-join), while Neo4j's index-free adjacency provides constant-time per-hop traversal regardless of graph size.

// Find all annotations linked to expressions in a corpus
MATCH (c:Corpus {uri: $corpusUri})-[:CONTAINS]->(e:Expression)
<-[:ANNOTATES]-(a:AnnotationLayer)
RETURN a.uri, a.kind, a.subkind
ORDER BY a.indexed_at DESC
LIMIT 100

Redis 7+

AttributeDetail
RoleSession cache, rate limiting, BullMQ job queue backend, pub/sub
Key featuresIn-memory performance, TTL-based expiry, Streams, pub/sub

Redis is accessed via ioredis, matching Chive's client choice for its cluster support, pipelining, and Lua scripting. Redis serves four distinct functions:

  1. Session cache: Stores authenticated user sessions with TTL-based expiry, avoiding per-request database lookups.
  2. Rate limiting: Implements sliding-window rate limiting using Redis sorted sets (ZREMRANGEBYSCORE + ZCARD + ZADD + EXPIRE pipeline), with tiered limits per user role (Anonymous 60/min, Authenticated 300/min, Premium 1000/min, Admin 5000/min) and a configurable fail-open/fail-closed mode.
  3. Job queue backend: BullMQ uses Redis Streams for persistent, ordered job queues with at-least-once delivery.
  4. Pub/sub: Real-time event notifications for connected clients (e.g., "new annotation on this expression").
  5. Authorization cache: Role assignments cached in sorted sets for fast RBAC lookups.

Job Queue

TechnologyVersionRole
BullMQ5+Job queue framework on Redis

BullMQ 5+ manages all asynchronous processing: firehose event ingestion, Elasticsearch/Neo4j indexing, format import pipelines, enrichment workers, and maintenance tasks. It uses Redis Streams as the backing store.

The appview organizes queues in a per-namespace topology. Each pub.layers.* namespace gets its own queue with independent concurrency, priority, and retry settings. This prevents a slow namespace (e.g., large annotation layer imports) from blocking fast namespaces (e.g., persona records):

const queues = {
'expression': new Queue('expression', { connection: redis }),
'annotation': new Queue('annotation', { connection: redis }),
'corpus': new Queue('corpus', { connection: redis }),
'graph': new Queue('graph', { connection: redis }),
'enrichment': new Queue('enrichment', { connection: redis }),
'maintenance': new Queue('maintenance', { connection: redis }),
}

Key features:

  • Priority queues: Firehose events are processed at higher priority than enrichment or maintenance jobs.
  • Dead letter queue: Jobs that exceed the retry limit (default: 5 retries with exponential backoff) are moved to a dead letter queue for manual inspection. See Firehose Ingestion for DLQ handling.
  • Backpressure: Workers pause consumption when downstream databases are unhealthy (detected via health checks), preventing queue buildup during outages.
  • Dashboard: BullMQ's Bull Board provides a web UI for inspecting queue state, retrying failed jobs, and monitoring throughput.

See Background Jobs for the full worker architecture.

Authentication and Authorization

TechnologyVersionRole
@atproto/oauth-client-nodelatestATProto OAuth 2.0 + PKCE flow
jose6+JWT signing and verification
Casbin5+RBAC policy engine
@simplewebauthn/serverlatestWebAuthn/FIDO2 passkey support
@otpliblatestTOTP-based MFA

ATProto OAuth 2.0 + PKCE is the primary authentication mechanism. Users authenticate with their ATProto identity (DID) through the standard OAuth flow. The @atproto/oauth-client-node package handles authorization URL generation, callback verification, token exchange, and token refresh.

jose 6+ signs and verifies JWT session tokens issued after OAuth completion. JWTs encode the user's DID, session ID, and permission claims. The library was chosen for its standards compliance (RFC 7515-7519), zero-dependency footprint, and Web Crypto API compatibility.

Casbin 5+ enforces role-based access control for annotation workflows. Layers requires more granular authorization than Chive's publish/read model: annotators, adjudicators, and corpus managers have different permissions over annotation layers, corpora, and experiment data. Casbin evaluates policies defined in a PERM model:

[request_definition]
r = sub, obj, act

[policy_definition]
p = sub, obj, act

[role_definition]
g = _, _

[policy_effect]
e = some(where (p.eft == allow))

[matchers]
m = g(r.sub, p.sub) && r.obj == p.obj && r.act == p.act

@simplewebauthn/server provides WebAuthn/FIDO2 support for passkey-based authentication as a second factor or passwordless alternative. @otplib provides TOTP-based MFA for users who prefer authenticator apps. See Authentication for the full auth architecture.

Observability

TechnologyVersionRole
Pino10+Structured JSON logging with PII redaction
OpenTelemetry1.xDistributed tracing, metrics, and logs (stable SDK)
prom-client15+Prometheus metrics
GrafanalatestDashboards and alerting
Grafana AlloylatestNext-gen telemetry collector (replaces Grafana Agent)

Pino 10+ produces structured JSON logs with automatic redaction of sensitive fields. It is the fastest Node.js logging library by benchmark, which matters for high-throughput firehose processing where logging overhead is measurable. Following Chive's pattern, the logger automatically injects OpenTelemetry trace context (traceId, spanId) into every log entry via a Pino mixin() function, and redacts a comprehensive set of sensitive fields:

const logger = pino({
level: 'info',
redact: [
'req.headers.authorization', 'req.headers.cookie',
'*.password', '*.token', '*.apiKey', '*.secret',
'*.credential', '*.accessToken', '*.refreshToken', '*.privateKey',
],
mixin() {
const span = trace.getActiveSpan()
const ctx = span?.spanContext()
return ctx ? { traceId: ctx.traceId, spanId: ctx.spanId } : {}
},
transport: {
target: 'pino-pretty',
options: { colorize: process.env.NODE_ENV !== 'production' },
},
})

OpenTelemetry 1.x (stable SDK) provides distributed tracing across the appview's async processing pipeline. The SDK has graduated from 0.x to stable 1.x, providing API guarantees. Traces follow a firehose event from ingestion through queue processing, database writes, and index updates. The OTLP exporter sends traces to a collector (Grafana Tempo in production, Jaeger in development). The OTel Logs bridge can also route Pino logs through the OTel Collector for unified observability.

prom-client 15+ exposes Prometheus-format metrics at /metrics. Key metrics include:

  • firehose_events_total (counter, by NSID)
  • queue_depth (gauge, by queue name)
  • db_query_duration_seconds (histogram, by database and operation)
  • http_request_duration_seconds (histogram, by route and status)

Grafana provides dashboards for all metrics and traces, with alerting rules for queue depth, error rates, and database latency. See Observability for dashboard definitions and alerting policies.

Infrastructure

TechnologyVersionRole
DockerlatestContainer builds (multi-stage, distroless runtime)
Kubernetes1.28+Container orchestration
KustomizelatestEnvironment-specific configuration overlays
ArgoCD / FluxlatestGitOps continuous delivery
External Secrets OperatorlatestProduction secrets management
cert-managerlatestTLS certificate automation
Sigstore cosignlatestContainer image signing and verification

Docker builds use multi-stage Dockerfiles with distroless runtime images for minimal attack surface. The build stage compiles TypeScript with tsc on Alpine, then copies only production artifacts to a distroless image that contains no shell, package manager, or unnecessary binaries. Separate compose files match Chive's pattern: docker-compose.yml (dev), docker-compose.prod.yml, docker-compose.ci.yml, docker-compose.observability.yml:

FROM node:22-alpine AS deps
WORKDIR /app
COPY pnpm-lock.yaml pnpm-workspace.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile

FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN pnpm build && pnpm prune --prod

FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
USER 1001
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["dist/index.js"]

Kubernetes manages deployment with Horizontal Pod Autoscaler (HPA) for scaling API and worker pods based on CPU/memory utilization and custom metrics (queue depth), Pod Disruption Budgets (PDB) for safe rollouts, and liveness/readiness probes that verify database connectivity. An optional Helm chart in k8s/helm/ provides templated deployment alongside the Kustomize approach.

Kustomize provides base manifests with overlays for dev, staging, and production environments. Each overlay adjusts resource limits, replica counts, database connection strings, and feature flags without duplicating manifests. Additional directories: k8s/monitoring/ (ServiceMonitors, Prometheus rules, Grafana dashboards), k8s/disaster-recovery/ (backup CronJobs), k8s/secrets/ (ExternalSecret definitions).

ArgoCD / Flux provides GitOps-based continuous delivery, replacing manual kubectl apply. Infrastructure state is declared in Git; ArgoCD Application or Flux Kustomization manifests in k8s/ drive automated reconciliation.

External Secrets Operator syncs secrets from a cloud provider's secret manager (AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault) into Kubernetes secrets. Database credentials, API keys, and signing keys never appear in manifests or environment variables.

cert-manager automates TLS certificate issuance and renewal via Let's Encrypt, eliminating manual certificate management.

Sigstore cosign signs container images in CI for supply chain verification. Every image pushed to the registry is signed with a keyless signature, and Kubernetes admission controllers verify signatures before allowing deployment.

See Deployment for the full deployment architecture, CI/CD pipeline, and backup strategy.

Testing

TechnologyVersionRole
Vitest4+Unit, integration, compliance, and pre-deployment tests
TestcontainerslatestEphemeral database containers for integration tests
Playwright1.57+End-to-end browser tests
k6latestLoad and performance testing

Vitest 4+ is the test runner for all non-E2E tests. It provides native ESM support, TypeScript execution without a separate compilation step, and a Jest-compatible API. Tests are organized into four tiers:

  1. Unit tests: Test individual record type handlers, validation logic, and utility functions in isolation with mocked dependencies.
  2. Integration tests: Test complete ingestion and query pipelines against real databases using Testcontainers.
  3. Compliance tests: Verify that every pub.layers.* lexicon is correctly parsed, validated, and indexed. These tests generate sample records from lexicon schemas and verify round-trip correctness.
  4. Pre-deployment tests: Run against a staging environment after deployment to verify API health, database connectivity, and firehose subscription.

Testcontainers spins up ephemeral PostgreSQL, Elasticsearch, Neo4j, and Redis containers for integration tests. Each test suite gets a fresh set of containers, ensuring test isolation:

const pg = await new PostgreSqlContainer('postgres:16-alpine').start()
const es = await new ElasticsearchContainer('elasticsearch:8.17.0').start()
const neo4j = await new Neo4jContainer('neo4j:5-community').start()
const redis = await new GenericContainer('redis:7-alpine').withExposedPorts(6379).start()

Playwright 1.57+ tests the Bull Board dashboard, OpenAPI documentation UI, and any web-facing admin interfaces end-to-end.

k6 runs load tests against staging to validate throughput targets: firehose ingestion rate, query latency percentiles, and concurrent connection limits.

See Testing Strategy for the full testing architecture.

Plugins and Extensibility

TechnologyVersionRole
isolated-vm6+V8 isolate sandbox per plugin
EventEmitter2latestAsync event bus for plugin hooks
AJVlatestJSON Schema validation for plugin manifests

isolated-vm 6+ provides a secure execution environment for third-party plugins. Each plugin runs in its own V8 isolate with no access to the host Node.js process, filesystem, or network. The appview injects a controlled API surface into each isolate.

The plugin system supports three extension points:

  1. Format importers: Convert external annotation formats (CoNLL, BRAT, ELAN, TEI, etc.) into pub.layers.* records for ingestion.
  2. Harvesters: Fetch records from external sources (institutional repositories, data archives) and submit them for indexing.
  3. Enrichment processors: Augment indexed records with derived data (e.g., automatic language detection, entity linking).

Permission model: Plugins declare required capabilities in a manifest. The appview grants only declared capabilities:

{
"name": "conll-importer",
"version": "1.0.0",
"capabilities": ["read:expression", "write:annotation", "write:segmentation"],
"limits": {
"maxMemoryMB": 128,
"maxCpuMs": 5000,
"maxWallTimeMs": 30000
}
}

Resource governor: The isolate enforces CPU time limits (via V8's --max-old-space-size and wall-time interrupts), memory limits (per-isolate heap cap), and execution time limits (wall-clock timeout). Plugins that exceed limits are terminated and their jobs are retried or routed to the dead letter queue.

isolated-vm was chosen over vm2 (deprecated due to security vulnerabilities) and Deno subprocesses (heavier resource footprint, more complex IPC).

See Plugin System for the full plugin architecture.

Build and Development

TechnologyVersionRole
Turbo2+Monorepo build orchestration
@atproto/lex-clilatestTypeScript codegen from lexicon JSON
node-pg-migratelatestPostgreSQL schema migrations
ESLint9+Linting (flat config)
Prettier3+Code formatting
Husky9+Git hook management

Turbo 2+ orchestrates builds across the monorepo. It understands package dependency graphs and caches build outputs, so incremental builds after a change to a single package only rebuild affected downstream packages. CI pipelines use turbo run build test lint with remote caching for fast feedback.

@atproto/lex-cli generates TypeScript types and validation functions from the 26 pub.layers.* lexicon JSON files. Generated types are used throughout the codebase for type-safe record handling:

lex-cli gen-ts ./lexicons --out ./packages/shared/src/lexicon-types

node-pg-migrate manages PostgreSQL schema migrations as TypeScript files in src/storage/postgresql/migrations/, following Chive's migration pattern.

ESLint 9+ uses the flat config format (eslint.config.js) with strict TypeScript rules. Prettier 3+ handles formatting. Husky 9+ runs lint and format checks on pre-commit hooks to prevent CI failures.

Resilience

TechnologyVersionRole
cockatiel3+Circuit breaker, retry, bulkhead, timeout

cockatiel 3+ is the sole resilience library, following Chive's pattern of consolidating all resilience patterns into a single composable policy chain. Every external service call (database queries, DID resolution, PDS requests, Elasticsearch queries, Neo4j operations) is wrapped in a cockatiel policy:

  • Circuit breaker: Opens after a configurable failure threshold (default: 5 consecutive failures), preventing cascading failures. Half-open state tests recovery after a timeout (default: 30s) before closing.
  • Retry: Exponential backoff with jitter for transient failures (default: 3 attempts, 1s base delay, 10s max delay).
  • Bulkhead: Limits concurrent requests to each external service, preventing one slow service from exhausting the connection pool.
  • Timeout: Enforces per-request timeouts for database queries and HTTP calls.
import { Policy, Duration } from 'cockatiel'

function createResiliencePolicy(name: string, logger: ILogger) {
return Policy.wrap(
Policy.handleAll()
.retry().attempts(3).exponential({ initialDelay: 1000, maxDelay: 10000 }),
Policy.timeout(5000),
Policy.circuitBreaker(5, Duration.ofSeconds(30)),
Policy.bulkhead(20),
)
}

// Applied to all storage adapters
const pgPolicy = createResiliencePolicy('postgresql', logger)
const esPolicy = createResiliencePolicy('elasticsearch', logger)
const neo4jPolicy = createResiliencePolicy('neo4j', logger)

const result = await pgPolicy.execute(() => pg.query(sql))

Policies are created per-service and shared across all callers, matching Chive's src/services/common/resilience.ts pattern.

Decision Log

Key architectural decisions, recorded in ADR (Architecture Decision Record) style:

DecisionChoiceRationaleAlternatives Considered
API frameworkHonoFastest benchmarks, native middleware composition, Zod integrationFastify (heavier), Express (legacy API)
Primary databasePostgreSQLAT-URI foreign keys, JSONB flexibility, mature ecosystemCockroachDB (overkill for single-region)
Search engineElasticsearchFaceted search, custom analyzers for linguistic data, nested objectsMeilisearch (lacks nested), Typesense (lacks custom analyzers)
Graph databaseNeo4jNative graph storage, Cypher query language, APOC libraryPostgreSQL recursive CTEs (poor performance at depth), Dgraph (less mature)
Job queueBullMQRedis-backed, per-queue concurrency, priority, DLQ, dashboardTemporal (complex setup), pg-boss (single-database bottleneck)
Plugin sandboxisolated-vmV8 isolate per plugin, memory/CPU limits, no host accessvm2 (deprecated, security issues), Deno subprocesses (heavier), WASI (not yet mature for Node.js plugins)
ValidationZodTypeScript type inference, composable, OpenAPI generationJoi (no type inference), AJV (JSON Schema only)
LoggingPinoFastest Node.js logger, structured JSON, redactionWinston (slower), Bunyan (unmaintained)
DI frameworktsyringeDecorator-based, lightweight, TypeScript-nativeInversifyJS (heavier), manual DI (tedious at scale)
Error handlingCustom Result<T, E>Zero deps, Chive compatibility, explicit error pathsEffect-TS (heavy, steep learning curve), neverthrow (less composable), thrown exceptions (implicit, untyped)
DB migrationsnode-pg-migrateTypeScript migration files, Chive precedent, fine-grained controlDrizzle ORM (AT-URI schemas don't map well to ORMs), Kysely (good but unnecessary abstraction layer)
Resiliencecockatiel onlySingle composable policy chain, Chive precedentp-queue + p-retry (multiple libraries for same concern), Polly.js (less maintained)
Container runtimeDistrolessNo shell, no package manager, minimal CVE surfaceAlpine (has shell and apk, larger attack surface), scratch (too minimal, missing libc)
Deployment modelGitOps (ArgoCD/Flux)Declarative, auditable, self-healingManual kubectl apply (error-prone, no drift detection), Helm-only (no GitOps reconciliation)
Container signingSigstore cosignKeyless signing, industry standard, K8s admission controller supportNotary v2 (less ecosystem support), GPG (manual key management)
TS decoratorsexperimentalDecoratorsRequired by tsyringe; TC39 stage 3 decorators not yet supportedTC39 decorators (tsyringe incompatible), no decorators (manual DI wiring)

See Also

  • Overview for the architecture diagram and record type coverage matrix
  • Database Design for PostgreSQL schema, Elasticsearch mappings, Neo4j graph model, and Redis data model
  • Firehose Ingestion for the subscription, filtering, and queue topology
  • API Design for XRPC and REST endpoint definitions
  • Authentication for the full OAuth 2.0, JWT, and RBAC architecture
  • Deployment for Docker, Kubernetes, and CI/CD configuration
  • Testing Strategy for the four-tier testing approach
  • Plugin System for the sandboxed plugin architecture