Deployment and Infrastructure
Container Strategy
Docker Multi-Stage Build
The appview uses a multi-stage Dockerfile with a distroless runtime image for minimal attack surface:
# Stage 1: Dependencies
FROM node:22-alpine AS deps
WORKDIR /app
RUN corepack enable pnpm
COPY pnpm-lock.yaml pnpm-workspace.yaml package.json ./
RUN pnpm install --frozen-lockfile
# Stage 2: Build
FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN pnpm build
RUN pnpm prune --prod
# Stage 3: Runtime (distroless — no shell, no package manager)
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
USER 1001
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
EXPOSE 3000
CMD ["dist/index.js"]
The final image runs as a non-root user (UID 1001) with no build tools, dev dependencies, or source code. Images are signed with Sigstore cosign in CI for supply chain verification.
Docker Compose Files
Multiple compose files match Chive's docker/ directory pattern:
| File | Purpose |
|---|---|
docker/docker-compose.yml | Development: PG, Redis, ES, Neo4j |
docker/docker-compose.prod.yml | Production: full stack with resource limits |
docker/docker-compose.ci.yml | CI testing: lightweight containers |
docker/docker-compose.observability.yml | Grafana, Tempo, Prometheus, OTEL Collector |
Supporting configs: docker/otel-collector-config.yaml, docker/prometheus.yml.
Image Variants
| Image | Entry Point | Purpose |
|---|---|---|
layers-appview:api | dist/index.js | API server (XRPC + REST) |
layers-appview:indexer | dist/indexer.js | Firehose consumer + job queue workers |
Both images share the same base build; only the CMD differs. This keeps the container registry simple while allowing independent scaling.
Kubernetes Architecture
Deployment Topology
API Deployment: Multiple replicas behind an ingress controller. Stateless; scales horizontally via HPA.
Indexer Deployment: Single replica. The firehose consumer must be single-instance to maintain cursor ordering. If the pod crashes, Kubernetes restarts it and it resumes from the persisted cursor. Workers within the indexer can run with higher concurrency but must coordinate through BullMQ's locking.
Horizontal Pod Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: layers-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: layers-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Resource Requests and Limits
| Deployment | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| API | 200m | 1000m | 256Mi | 1Gi |
| Indexer | 500m | 2000m | 512Mi | 2Gi |
The indexer has higher resource allocation because it handles firehose parsing, record validation, and multi-database writes concurrently.
Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: layers-api-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: layers-api
Health Probes
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
Kustomize Overlays
The Kubernetes manifests are organized with Kustomize, with additional directories matching Chive's k8s/ structure:
k8s/
├── base/
│ ├── appview/ # API deployment, service, HPA, PDB
│ ├── indexer/ # Indexer deployment
│ ├── rbac/ # ServiceAccounts, Roles, RoleBindings
│ ├── ingress/ # Ingress with cert-manager annotations
│ └── kustomization.yaml
├── overlays/
│ ├── dev/ # Lower resources, single replica
│ ├── staging/ # Production-like, staging secrets
│ └── prod/ # Full resources, external secrets
├── helm/ # Optional Helm chart for templated deployment
│ └── layers-appview/
├── monitoring/ # ServiceMonitors, Prometheus rules, Grafana dashboards
├── disaster-recovery/ # Backup CronJobs (PG, ES, Neo4j)
├── secrets/ # ExternalSecret definitions
└── gitops/ # ArgoCD Application or Flux Kustomization manifests
Each overlay patches resource limits, replica counts, environment variables, and secret references for its target environment.
GitOps Deployment
Production deployments use ArgoCD or Flux for GitOps-based continuous delivery, replacing manual kubectl apply:
# k8s/gitops/argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: layers-appview
spec:
source:
repoURL: https://github.com/layers-pub/layers
path: k8s/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: layers
syncPolicy:
automated:
prune: true
selfHeal: true
Database Deployment
| Database | Recommended Approach | Notes |
|---|---|---|
| PostgreSQL 16+ | Managed service (e.g., AWS RDS, GCP Cloud SQL) or operator (CloudNativePG) | Enable WAL archiving for point-in-time recovery |
| Elasticsearch 8+ | Managed service (Elastic Cloud) or operator (ECK) | Minimum 3-node cluster for production |
| Neo4j 5+ | Managed service (Neo4j Aura) or Helm chart | Single instance sufficient for moderate workloads |
| Redis 7+ | Managed service (ElastiCache, Memorystore) or Helm chart (Bitnami) | Sentinel or Cluster mode for HA |
For development, all four databases run as containers via Docker Compose.
Secrets Management
Production deployments use the External Secrets Operator to synchronize secrets from a vault into Kubernetes secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: layers-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: layers-secrets
data:
- secretKey: JWT_SECRET
remoteRef:
key: layers/production/jwt-secret
- secretKey: DATABASE_URL
remoteRef:
key: layers/production/database-url
TLS
TLS termination is handled at the ingress controller (nginx-ingress) with certificates provisioned by cert-manager and Let's Encrypt.
CI/CD Pipeline
Build Pipeline
- Lint: ESLint + Prettier check
- Type check:
tsc --noEmit - Unit tests: Vitest (
vitest.unit.config.ts) - Build: TypeScript compilation + Docker image build
- Push: Push image to container registry with git SHA tag
- Sign: Sign image with Sigstore cosign (keyless) for supply chain verification
Test Pipeline
- Integration tests: Vitest with Testcontainers (PG, ES, Neo4j, Redis) (
vitest.config.ts) - Compliance tests: Lexicon schema validation for all 26 record types (
vitest.compliance.config.ts) - E2E tests: Playwright against a staging environment
- Performance tests: k6 load scenarios (release-only)
Deploy Pipeline
- Staging: Automatic deploy via GitOps (ArgoCD/Flux) on merge to
main - Pre-deployment tests: Health checks against staging (
vitest.pre-deployment.config.ts) - Production: Manual promotion from staging (approval gate)
- Rollback: Redeploy previous image tag via Kubernetes rollout or GitOps revert
Backup and Recovery
PostgreSQL
- Continuous WAL archiving to object storage (S3, GCS)
- Daily base backups via
pg_basebackup - Point-in-time recovery using WAL replay
- Retention: 30 days of backups
Elasticsearch
- Snapshot lifecycle management to object storage
- Daily snapshots with 14-day retention
- Can be fully rebuilt from PostgreSQL if snapshots are lost
Neo4j
- Online backups via
neo4j-admin dump - Daily with 14-day retention
- Can be fully rebuilt from PostgreSQL if backups are lost
Disaster Recovery
Since all appview data is derived from the ATProto firehose, the ultimate disaster recovery strategy is a full re-index from cursor 0. This is slower than restoring from backups but guarantees complete data integrity.
| Recovery Method | RTO | RPO |
|---|---|---|
| PG point-in-time recovery | Minutes | Seconds (WAL lag) |
| Snapshot restore (ES, Neo4j) | 30 min | 24 hours (daily snapshots) |
| Full re-index from firehose | Hours | Zero (complete rebuild) |
See Also
- Technology Stack for Docker, Kubernetes, and infrastructure tool versions
- Observability for health check endpoints and monitoring
- Testing Strategy for CI/CD test configuration