Lexicon Overview
Layers consists of 13 lexicons organized into core pipeline layers, parallel tracks, and integration layers.
Lexicon Directory
| Lexicon | ID | Purpose |
|---|---|---|
| Definitions | pub.layers.defs | Core primitives: objectRef, anchor, constraint, agentRef, annotationMetadata, knowledgeRef, featureMap, alignmentLink |
| Expression | pub.layers.expression | Any linguistic unit (document, paragraph, sentence, word, morpheme) with recursive nesting |
| Segmentation | pub.layers.segmentation | Tokenization, chunking, segmentation boundaries, span indices |
| Annotation | pub.layers.annotation | Linguistic labels and categories (POS, NER, SRL, discourse, etc.) |
| Ontology | pub.layers.ontology | Label definitions, linguistic categories, theory/framework records |
| Corpus | pub.layers.corpus | Corpus metadata, membership, and statistics |
| Resource | pub.layers.resource | Lexical entries, collections, stimulus templates, fillings, and template compositions |
| Judgment | pub.layers.judgment | Human judgments, model predictions, confidence, disagreement, experiment design |
| Alignment | pub.layers.alignment | Cross-record linking, token correspondence, equivalence |
| Graph | pub.layers.graph | Generic typed property graph for knowledge representation and cross-referencing |
| Persona | pub.layers.persona | Agent personas, theoretical frameworks, backgrounds |
| Media | pub.layers.media | Audio, video, image, and paged document references |
| Eprint | pub.layers.eprint | Eprint linkage, data provenance, scholarly metadata, reproducibility |
Lexicon Organization
Core Pipeline
The pipeline layers build incrementally, each depending on primitives and the layer before:
pub.layers.defs (Primitives)
↓
pub.layers.expression (Linguistic Units)
↓
pub.layers.segmentation (Tokenization)
↓
pub.layers.annotation (Labels)
Expressions are recursive: a document contains paragraphs, which contain sentences, which contain words. Each expression can reference its parent via parentRef, and segmentation records define the ordered decomposition of a parent expression into child expressions. Structure types (section, sentence, word, morpheme, etc.) are expression kind values, not a separate lexicon.
Parallel Tracks
These lexicons support the pipeline but do not depend on each other in strict order:
- pub.layers.ontology: Authority records for labels and categories (supports annotation)
- pub.layers.corpus: Corpus metadata, membership, and statistics (organizes expressions)
- pub.layers.resource: Lexical entries, collections, stimulus templates, and fillings (supports experiments)
- pub.layers.judgment: Human and model judgments, experiment design (supports annotation and alignment)
- pub.layers.alignment: Cross-record correspondence (integrates across layers)
Integration Layers
These lexicons connect Layers to the ATProto ecosystem:
- pub.layers.graph: Generic typed property graph with nodes, edges, and edge sets for representing knowledge, cross-document relations, and arbitrary typed relationships
- pub.layers.persona: Agent personas and theoretical frameworks for annotator/model attribution
- pub.layers.media: Audio, video, image, and paged document references
- pub.layers.eprint: Eprint linkage, data provenance, reproducibility information, and scholarly metadata
Dependency Graph
Pipeline Layers
pub.layers.defs (Primitives)
Foundation for all other lexicons. Defines abstract, composable primitives:
objectRef: cross-referencing mechanismanchor: polymorphic attachment pointsconstraint: DSL-agnostic expressionsagentRef: agent identityannotationMetadata: provenance tracking (includingdependenciesfor provenance chains)knowledgeRef: external KB references (ATProto-native, external, or user-specific)featureMap: open-ended extensibilityalignmentLink: sequence correspondence
Depends on: Nothing (foundational)
Used by: All other lexicons
File: schemas/defs.json
pub.layers.expression (Linguistic Units)
Any linguistic unit, from a single morpheme to a full document, with recursive nesting:
- Source data (text, audio, video, image, paged documents)
- Recursive nesting via
parentRef(document → paragraph → sentence → word → morpheme) - Multimodal anchoring to parent expressions
- Language identification and multilingual support
- Links to eprints and knowledge bases
Depends on: pub.layers.defs
Used by: All downstream layers
File: schemas/expression.json
pub.layers.segmentation (Tokenization & Chunking)
Defines how a parent expression is broken into child expressions:
- Ordered decomposition of expressions into sub-expressions
- Tokenization strategies (whitespace, BPE, morphological, etc.)
- Index-based access for efficient token retrieval
Depends on: pub.layers.defs, pub.layers.expression
Used by: pub.layers.annotation
File: schemas/segmentation.json
pub.layers.annotation (Linguistic Annotations)
Labels, categories, semantic roles, discourse relations:
- Abstract annotation record type (discriminated by kind/subkind)
- Anchoring (to tokens, text spans, temporal regions, etc.)
- Flexible tagging: POS, NER, SRL, sentiment, discourse, relations, coreference, document-level tags
- K-best alternatives via
rankandalternativesRef - Cluster sets for coreference and cross-document clustering
- Metadata: confidence, provenance, source
Depends on: pub.layers.defs, pub.layers.expression, pub.layers.segmentation, pub.layers.ontology, pub.layers.judgment, pub.layers.persona
Used by: pub.layers.alignment, integration layers
File: schemas/annotation.json
Parallel Support Layers
pub.layers.ontology (Label Definitions)
Authority records for linguistic categories, tag sets, frameworks:
- Label definitions (POS tagsets, NER schemes, SRL frames)
- Theoretical framework definitions (generative syntax, construction grammar, etc.)
- Relation type properties (symmetric, transitive, reflexive, inverse, domain, range)
- Linked data to external KBs (Wikidata, WordNet, FrameNet, etc.)
- Authority records with versioning
Depends on: pub.layers.defs
Used by: pub.layers.annotation, pub.layers.graph
File: schemas/ontology.json
pub.layers.corpus (Corpus Metadata)
Corpus records, membership, and statistics:
- Corpus definitions (name, description, license, citation)
- Expression membership (which expressions belong to a corpus)
- Corpus-level statistics and metadata
Depends on: pub.layers.defs, pub.layers.expression
Used by: pub.layers.annotation (corpus context), pub.layers.eprint
File: schemas/corpus.json
pub.layers.resource (Lexical Resources & Templates)
Lexical entries, collections, stimulus templates, fillings, and compositions:
- Lexical resource entries (lemma, form, language, MWE components)
- Named collections (lexicons, frame inventories, gazetteers, paradigm tables)
- Parameterized text templates with named variable slots
- Template compositions (sequences and trees of templates)
- Filled templates with rendered text
Depends on: pub.layers.defs, pub.layers.expression
Used by: pub.layers.judgment (stimulus generation)
File: schemas/resource.json
pub.layers.judgment (Judgments & Experiments)
Human judgments, model predictions, experiment design:
- Experiment definitions with task types, guidelines, and design specifications
- List constraints (Latin square, balancing, adjacency rules)
- Judgment sets with behavioral analytics
- Inter-annotator agreement reports
Depends on: pub.layers.defs
Used by: pub.layers.annotation, applications for filtering/ranking
File: schemas/judgment.json
pub.layers.alignment (Cross-Record Linking)
Linking annotations across records and layers:
- Token alignment (across language pairs, different tokenizations)
- Span correspondence (across different segmentations or annotations)
- Annotation equivalence (same meaning, different labels)
- Alignment confidence and metadata
Depends on: pub.layers.defs
Used by: All layers (enables multi-layer composition)
File: schemas/alignment.json
Integration Layers
pub.layers.graph (Typed Property Graph)
Generic typed property graph for knowledge representation and cross-referencing:
- Standalone graph nodes (entities, concepts, events, states, claims, propositions)
- Typed directed edges between any Layers objects (expressions, annotations, graph nodes, external KB nodes)
- Batch edge sets for efficient bulk operations
- Supports multidigraphs and cycles
- Subsumes cross-document relations, knowledge grounding, expression graphs, and intertextual linking
Depends on: pub.layers.defs, pub.layers.ontology
Used by: pub.layers.annotation (via knowledgeRef), all layers (via objectRef-based cross-referencing)
File: schemas/graph.json
pub.layers.persona (Agent Personas)
Agent personas, theoretical frameworks, methodological backgrounds:
- Persona records (annotator background, theoretical affiliation, expertise)
- Framework records (generative syntax, dependency grammar, UD conventions, etc.)
- Reproducibility metadata
Depends on: pub.layers.defs
Used by: pub.layers.annotation (via agentRef.personaRef)
File: schemas/persona.json
pub.layers.media (Multimodal References)
Audio, video, image, and paged document references:
- Audio/video file references (codec, sample rate, duration)
- Image file references (resolution, format, color space)
- Paged document references (page count, OCR metadata)
- Media-specific timestamps and spatial coordinates
Depends on: pub.layers.defs
Used by: pub.layers.expression (source documents), pub.layers.segmentation (temporal/spatial anchors)
File: schemas/media.json
pub.layers.eprint (Eprint & Data Provenance)
Eprint linkage, data provenance, reproducibility, and scholarly metadata:
- Eprint links (DOI, arXiv, ACL Anthology, any platform)
- Data links connecting publications to the data they produced (corpora, annotations, model outputs)
- Reproducibility information (code, commit hash, environment, random seed)
- Citation metadata
Depends on: pub.layers.defs, pub.layers.corpus
Used by: pub.layers.expression (eprintRef), discovery/search
File: schemas/eprint.json
See Also
- Introduction for architectural overview
- Foundations for design principles, primitives, and the flexible enum pattern
- Temporal Representation for temporal primitives and standards mapping
- Spatial Representation for spatial primitives and standards mapping
- Multimodal Annotation for annotation across text, audio, video, and image
- Knowledge Grounding for linking annotations to external knowledge bases