Lexicon Overview

Layers consists of 13 lexicons organized into core pipeline layers, parallel tracks, and integration layers.

Lexicon Directory

Lexicon	ID	Purpose
Definitions	`pub.layers.defs`	Core primitives: objectRef, anchor, constraint, agentRef, annotationMetadata, knowledgeRef, featureMap, alignmentLink
Expression	`pub.layers.expression`	Any linguistic unit (document, paragraph, sentence, word, morpheme) with recursive nesting
Segmentation	`pub.layers.segmentation`	Tokenization, chunking, segmentation boundaries, span indices
Annotation	`pub.layers.annotation`	Linguistic labels and categories (POS, NER, SRL, discourse, etc.)
Ontology	`pub.layers.ontology`	Label definitions, linguistic categories, theory/framework records
Corpus	`pub.layers.corpus`	Corpus metadata, membership, and statistics
Resource	`pub.layers.resource`	Lexical entries, collections, stimulus templates, fillings, and template compositions
Judgment	`pub.layers.judgment`	Human judgments, model predictions, confidence, disagreement, experiment design
Alignment	`pub.layers.alignment`	Cross-record linking, token correspondence, equivalence
Graph	`pub.layers.graph`	Generic typed property graph for knowledge representation and cross-referencing
Persona	`pub.layers.persona`	Agent personas, theoretical frameworks, backgrounds
Media	`pub.layers.media`	Audio, video, image, and paged document references
Eprint	`pub.layers.eprint`	Eprint linkage, data provenance, scholarly metadata, reproducibility

Lexicon Organization

Core Pipeline

The pipeline layers build incrementally, each depending on primitives and the layer before:

pub.layers.defs (Primitives)
    ↓
pub.layers.expression (Linguistic Units)
    ↓
pub.layers.segmentation (Tokenization)
    ↓
pub.layers.annotation (Labels)

Expressions are recursive: a document contains paragraphs, which contain sentences, which contain words. Each expression can reference its parent via parentRef, and segmentation records define the ordered decomposition of a parent expression into child expressions. Structure types (section, sentence, word, morpheme, etc.) are expression kind values, not a separate lexicon.

Parallel Tracks

These lexicons support the pipeline but do not depend on each other in strict order:

pub.layers.ontology: Authority records for labels and categories (supports annotation)
pub.layers.corpus: Corpus metadata, membership, and statistics (organizes expressions)
pub.layers.resource: Lexical entries, collections, stimulus templates, and fillings (supports experiments)
pub.layers.judgment: Human and model judgments, experiment design (supports annotation and alignment)
pub.layers.alignment: Cross-record correspondence (integrates across layers)

Integration Layers

These lexicons connect Layers to the ATProto ecosystem:

pub.layers.graph: Generic typed property graph with nodes, edges, and edge sets for representing knowledge, cross-document relations, and arbitrary typed relationships
pub.layers.persona: Agent personas and theoretical frameworks for annotator/model attribution
pub.layers.media: Audio, video, image, and paged document references
pub.layers.eprint: Eprint linkage, data provenance, reproducibility information, and scholarly metadata

Dependency Graph

Pipeline Layers

pub.layers.defs (Primitives)

Foundation for all other lexicons. Defines abstract, composable primitives:

objectRef: cross-referencing mechanism
anchor: polymorphic attachment points
constraint: DSL-agnostic expressions
agentRef: agent identity
annotationMetadata: provenance tracking (including dependencies for provenance chains)
knowledgeRef: external KB references (ATProto-native, external, or user-specific)
featureMap: open-ended extensibility
alignmentLink: sequence correspondence

Depends on: Nothing (foundational)

Used by: All other lexicons

File: schemas/defs.json

pub.layers.expression (Linguistic Units)

Any linguistic unit, from a single morpheme to a full document, with recursive nesting:

Source data (text, audio, video, image, paged documents)
Recursive nesting via parentRef (document → paragraph → sentence → word → morpheme)
Multimodal anchoring to parent expressions
Language identification and multilingual support
Links to eprints and knowledge bases

Depends on: pub.layers.defs

Used by: All downstream layers

File: schemas/expression.json

pub.layers.segmentation (Tokenization & Chunking)

Defines how a parent expression is broken into child expressions:

Ordered decomposition of expressions into sub-expressions
Tokenization strategies (whitespace, BPE, morphological, etc.)
Index-based access for efficient token retrieval

Depends on: pub.layers.defs, pub.layers.expression

Used by: pub.layers.annotation

File: schemas/segmentation.json

pub.layers.annotation (Linguistic Annotations)

Labels, categories, semantic roles, discourse relations:

Abstract annotation record type (discriminated by kind/subkind)
Anchoring (to tokens, text spans, temporal regions, etc.)
Flexible tagging: POS, NER, SRL, sentiment, discourse, relations, coreference, document-level tags
K-best alternatives via rank and alternativesRef
Cluster sets for coreference and cross-document clustering
Metadata: confidence, provenance, source

Depends on: pub.layers.defs, pub.layers.expression, pub.layers.segmentation, pub.layers.ontology, pub.layers.judgment, pub.layers.persona

Used by: pub.layers.alignment, integration layers

File: schemas/annotation.json

Parallel Support Layers

pub.layers.ontology (Label Definitions)

Authority records for linguistic categories, tag sets, frameworks:

Label definitions (POS tagsets, NER schemes, SRL frames)
Theoretical framework definitions (generative syntax, construction grammar, etc.)
Relation type properties (symmetric, transitive, reflexive, inverse, domain, range)
Linked data to external KBs (Wikidata, WordNet, FrameNet, etc.)
Authority records with versioning

Depends on: pub.layers.defs

Used by: pub.layers.annotation, pub.layers.graph

File: schemas/ontology.json

pub.layers.corpus (Corpus Metadata)

Corpus records, membership, and statistics:

Corpus definitions (name, description, license, citation)
Expression membership (which expressions belong to a corpus)
Corpus-level statistics and metadata

Depends on: pub.layers.defs, pub.layers.expression

Used by: pub.layers.annotation (corpus context), pub.layers.eprint

File: schemas/corpus.json

pub.layers.resource (Lexical Resources & Templates)

Lexical entries, collections, stimulus templates, fillings, and compositions:

Lexical resource entries (lemma, form, language, MWE components)
Named collections (lexicons, frame inventories, gazetteers, paradigm tables)
Parameterized text templates with named variable slots
Template compositions (sequences and trees of templates)
Filled templates with rendered text

Depends on: pub.layers.defs, pub.layers.expression

Used by: pub.layers.judgment (stimulus generation)

File: schemas/resource.json

pub.layers.judgment (Judgments & Experiments)

Human judgments, model predictions, experiment design:

Experiment definitions with task types, guidelines, and design specifications
List constraints (Latin square, balancing, adjacency rules)
Judgment sets with behavioral analytics
Inter-annotator agreement reports

Depends on: pub.layers.defs

Used by: pub.layers.annotation, applications for filtering/ranking

File: schemas/judgment.json

pub.layers.alignment (Cross-Record Linking)

Linking annotations across records and layers:

Token alignment (across language pairs, different tokenizations)
Span correspondence (across different segmentations or annotations)
Annotation equivalence (same meaning, different labels)
Alignment confidence and metadata

Depends on: pub.layers.defs

Used by: All layers (enables multi-layer composition)

File: schemas/alignment.json

Integration Layers

pub.layers.graph (Typed Property Graph)

Generic typed property graph for knowledge representation and cross-referencing:

Standalone graph nodes (entities, concepts, events, states, claims, propositions)
Typed directed edges between any Layers objects (expressions, annotations, graph nodes, external KB nodes)
Batch edge sets for efficient bulk operations
Supports multidigraphs and cycles
Subsumes cross-document relations, knowledge grounding, expression graphs, and intertextual linking

Depends on: pub.layers.defs, pub.layers.ontology

Used by: pub.layers.annotation (via knowledgeRef), all layers (via objectRef-based cross-referencing)

File: schemas/graph.json

pub.layers.persona (Agent Personas)

Agent personas, theoretical frameworks, methodological backgrounds:

Persona records (annotator background, theoretical affiliation, expertise)
Framework records (generative syntax, dependency grammar, UD conventions, etc.)
Reproducibility metadata

Depends on: pub.layers.defs

Used by: pub.layers.annotation (via agentRef.personaRef)

File: schemas/persona.json

pub.layers.media (Multimodal References)

Audio, video, image, and paged document references:

Audio/video file references (codec, sample rate, duration)
Image file references (resolution, format, color space)
Paged document references (page count, OCR metadata)
Media-specific timestamps and spatial coordinates

Depends on: pub.layers.defs

Used by: pub.layers.expression (source documents), pub.layers.segmentation (temporal/spatial anchors)

File: schemas/media.json

pub.layers.eprint (Eprint & Data Provenance)

Eprint linkage, data provenance, reproducibility, and scholarly metadata:

Eprint links (DOI, arXiv, ACL Anthology, any platform)
Data links connecting publications to the data they produced (corpora, annotations, model outputs)
Reproducibility information (code, commit hash, environment, random seed)
Citation metadata

Depends on: pub.layers.defs, pub.layers.corpus

Used by: pub.layers.expression (eprintRef), discovery/search

File: schemas/eprint.json

Lexicon Directory​

Lexicon Organization​

Core Pipeline​

Parallel Tracks​

Integration Layers​

Dependency Graph​

Pipeline Layers​

pub.layers.defs (Primitives)​

pub.layers.expression (Linguistic Units)​

pub.layers.segmentation (Tokenization & Chunking)​

pub.layers.annotation (Linguistic Annotations)​

Parallel Support Layers​

pub.layers.ontology (Label Definitions)​

pub.layers.corpus (Corpus Metadata)​

pub.layers.resource (Lexical Resources & Templates)​

pub.layers.judgment (Judgments & Experiments)​

pub.layers.alignment (Cross-Record Linking)​

Integration Layers​

pub.layers.graph (Typed Property Graph)​

pub.layers.persona (Agent Personas)​

pub.layers.media (Multimodal References)​

pub.layers.eprint (Eprint & Data Provenance)​

See Also​

Lexicon Directory

Lexicon Organization

Core Pipeline

Parallel Tracks

Integration Layers

Dependency Graph

Pipeline Layers

pub.layers.defs (Primitives)

pub.layers.expression (Linguistic Units)

pub.layers.segmentation (Tokenization & Chunking)

pub.layers.annotation (Linguistic Annotations)

Parallel Support Layers

pub.layers.ontology (Label Definitions)

pub.layers.corpus (Corpus Metadata)

pub.layers.resource (Lexical Resources & Templates)

pub.layers.judgment (Judgments & Experiments)

pub.layers.alignment (Cross-Record Linking)

Integration Layers

pub.layers.graph (Typed Property Graph)

pub.layers.persona (Agent Personas)

pub.layers.media (Multimodal References)

pub.layers.eprint (Eprint & Data Provenance)

See Also