Query and Discovery Patterns

The appview answers two categories of questions: retrieval (get a specific record by its AT-URI) and discovery (find records matching some criteria). Retrieval hits PostgreSQL directly. Discovery queries fan out across PostgreSQL, Elasticsearch, and Neo4j depending on the query shape.

Service Layer

Query logic is encapsulated in service classes in src/services/, matching Chive's pattern:

Service	File	Responsibility
`SearchService`	`src/services/search/search-service.ts`	Full-text search, faceted filtering via Elasticsearch
`RankingService`	`src/services/search/ranking-service.ts`	Result scoring by confidence, recency, persona reputation
`AutocompleteService`	`src/services/search/autocomplete-service.ts`	Expression text, ontology names, label value completion
`QueryCache`	`src/services/search/query-cache.ts`	Redis-backed TTL cache for ES query results
`DiscoveryService`	`src/services/discovery/discovery-service.ts`	Recommendations: "similar annotations", "related corpora"

All service methods return Result<T, LayersError> and are injected via tsyringe.

Discovery Use Cases

Use Case	Primary Backend	Query Shape
Find all annotations on a given expression	PG	`WHERE expression_ref = $1` on `annotation_layers`
Find all expressions in a given corpus	PG + Neo4j	`corpus_memberships` join or Neo4j `MEMBER_OF` traversal
Find all annotation layers using a given ontology	PG	`WHERE ontology_ref = $1` on `annotation_layers`
Find all entities grounded to a Wikidata QID	Neo4j	`KNOWLEDGE_REF` edge traversal from a knowledge node
Find all annotations in Universal Dependencies formalism	ES	Faceted filter on `formalism = "universal-dependencies"`
Find all experiments measuring acceptability	ES	Faceted filter on `measureType = "acceptability"`
Find all corpora in a given language	ES	Keyword filter on `language`
Find all data linked to a given eprint	PG + Neo4j	`cross_references WHERE target_uri = $eprint` or `LINKS_EPRINT` traversal
Find all annotations by a specific persona	PG	`WHERE persona_ref = $1` on `annotation_layers`
Find the graph neighborhood of a node	Neo4j	Cypher variable-length path query
Find all changes to a given record	PG + ES	`changelogs WHERE subject_uri = $1` or ES filter on `subject`
Find recent changes across a collection type	ES	Faceted filter on `subjectCollection`, sorted by `createdAt`

Query Implementation Patterns

Single-Record Retrieval

Every get* XRPC endpoint resolves to a PostgreSQL primary key lookup:

SELECT record FROM expressions WHERE uri = $1;

Expected latency: < 5ms for indexed lookups.

Paginated Collection Listing

Every list* XRPC endpoint paginates with a cursor over a user's records:

SELECT uri, record
FROM expressions
WHERE did = $1
  AND uri > $2  -- cursor
ORDER BY uri ASC
LIMIT $3;

Full-Text Search

Elasticsearch powers the /api/v1/search endpoint:

{
  "query": {
    "bool": {
      "must": [
        { "multi_match": {
            "query": "syntactic ambiguity",
            "fields": ["text^3", "text.stemmed"]
        }}
      ],
      "filter": [
        { "term": { "lang": "en" } }
      ]
    }
  }
}

The text field uses a custom layers_text analyzer with ICU tokenization and Unicode normalization. The text.stemmed sub-field applies language-specific stemming.

Faceted Annotation Search

The three-dimensional annotation search (kind, subkind, formalism) uses ES term aggregations:

{
  "query": {
    "bool": {
      "filter": [
        { "term": { "kind": "span" } },
        { "term": { "subkind": "ner" } },
        { "term": { "labelSet": "ontonotes-ner" } }
      ]
    }
  },
  "aggs": {
    "by_label": {
      "terms": { "field": "annotations.label", "size": 50 }
    }
  }
}

This returns matching annotation layers and a label distribution histogram in a single request.

Graph Traversal

Neo4j handles multi-hop queries that would require expensive recursive CTEs in PostgreSQL:

// Find all annotations transitively connected to a Wikidata entity
MATCH (kb:KnowledgeNode {externalId: "Q76"})
      <-[:KNOWLEDGE_REF]-(ann:Annotation)
      -[:PART_OF]->(layer:AnnotationLayer)
      -[:ANNOTATES]->(expr:Expression)
RETURN layer.uri, expr.uri, ann.label
LIMIT 100

Cross-Reference Traversal

Forward References ("What does this record point to?")

SELECT to_uri, ref_type
FROM cross_references
WHERE from_uri = $1;

Reverse References ("What points to this record?")

SELECT from_uri, ref_type
FROM cross_references
WHERE to_uri = $1;

Transitive Closure ("All descendants of this expression")

Expression hierarchy traversal uses Neo4j's variable-length path syntax:

MATCH (root:Expression {uri: $1})-[:PARENT_OF*1..]->(desc:Expression)
RETURN desc.uri, length(path) AS depth
ORDER BY depth

This is faster than PostgreSQL recursive CTEs for deep hierarchies (documents with hundreds of nested paragraphs, sentences, and words).

Annotation-Specific Queries

By Kind/Subkind/Formalism

All three fields are keyword-indexed in Elasticsearch, enabling combinatorial filtering:

Query	ES Filter
All POS layers	`kind = "token-tag"` AND `subkind = "pos"`
All NER layers in OntoNotes	`subkind = "ner"` AND `labelSet = "ontonotes-ner"`
All dependency parses	`kind = "relation"` AND `subkind = "dependency"`
All UD layers	`formalism = "universal-dependencies"`

By Label/Value

Individual annotation labels within layers are indexed as nested objects in ES:

{
  "query": {
    "nested": {
      "path": "annotations",
      "query": {
        "term": { "annotations.label": "PERSON" }
      }
    }
  }
}

By Confidence Threshold

{
  "query": {
    "nested": {
      "path": "annotations",
      "query": {
        "range": { "annotations.confidence": { "gte": 800 } }
      }
    }
  }
}

By Anchor Type

{
  "query": {
    "nested": {
      "path": "annotations",
      "query": {
        "term": { "annotations.anchor_type": "temporalSpan" }
      }
    }
  }
}

This finds annotations anchored to temporal regions (audio/video), as opposed to text spans or token references.

Graph Queries

Neighborhood Expansion

MATCH (n {uri: $1})-[r]-(neighbor)
RETURN type(r) AS edgeType, r.edgeType AS semanticType,
       neighbor.uri AS neighborUri, labels(neighbor) AS nodeLabels
LIMIT 50

Typed Traversal

Follow only edges of a specific type (e.g., only denotes edges):

MATCH (n {uri: $1})-[r:GRAPH_EDGE {edgeType: "denotes"}]->(target)
RETURN target.uri, target.name

Shortest Path

MATCH path = shortestPath(
  (a {uri: $1})-[*..10]-(b {uri: $2})
)
RETURN [n IN nodes(path) | n.uri] AS nodeUris,
       [r IN relationships(path) | type(r)] AS edgeTypes

Aggregation Queries

Label Distribution per Corpus

SELECT a.label, COUNT(*) AS count
FROM annotations a
JOIN annotation_layers al ON a.layer_uri = al.uri
JOIN cross_references cr ON cr.from_uri = al.expression_ref
JOIN corpus_memberships cm ON cm.expression_ref = cr.to_uri
WHERE cm.corpus_ref = $1
GROUP BY a.label
ORDER BY count DESC;

Annotation Coverage per Expression

SELECT al.kind, al.subkind, COUNT(*) AS layer_count
FROM annotation_layers al
WHERE al.expression_ref = $1
GROUP BY al.kind, al.subkind
ORDER BY layer_count DESC;

Caching Strategy

Redis caches frequently accessed data to reduce database load:

Cache Key Pattern	TTL	Content
`record:{uri}`	5 min	Full record JSONB
`refs:{uri}`	5 min	Cross-reference list for a record
`search:{hash}`	1 min	ES search result page
`corpus_stats:{uri}`	15 min	Materialized corpus statistics

Cache invalidation: when a record is updated or deleted via the firehose, its cache key and related cache keys are evicted immediately.

Future Considerations

Semantic search: ES dense_vector fields could enable vector-based semantic search over annotation label embeddings, complementing keyword-based faceting with similarity-based retrieval.
Learning-to-rank: A RelevanceLogger (analogous to Chive's) could collect click-through data on search results to train a learning-to-rank model for improved result ordering.

Service Layer​

Discovery Use Cases​

Query Implementation Patterns​

Single-Record Retrieval​

Paginated Collection Listing​

Full-Text Search​

Faceted Annotation Search​

Graph Traversal​

Cross-Reference Traversal​

Forward References ("What does this record point to?")​

Reverse References ("What points to this record?")​

Transitive Closure ("All descendants of this expression")​

Annotation-Specific Queries​

By Kind/Subkind/Formalism​

By Label/Value​

By Confidence Threshold​

By Anchor Type​

Graph Queries​

Neighborhood Expansion​

Typed Traversal​

Shortest Path​

Aggregation Queries​

Label Distribution per Corpus​

Annotation Coverage per Expression​

Caching Strategy​

Future Considerations​

See Also​