Skip to main content

pub.layers.corpus

Corpus records. A corpus is a named, versioned collection of expressions with shared metadata, annotation guidelines, and ontologies.

Types

corpus

NSID: pub.layers.corpus.corpus Type: Record

A corpus: a curated collection of expressions.

FieldTypeDescription
namestringCorpus name.
descriptionstringDetailed description of the corpus.
versionstringVersion string for the corpus release.
languagestringPrimary BCP-47 language tag.
languagesarrayAll languages represented. Array of strings
domainUriat-uriAT-URI of the domain definition node. Community-expandable via knowledge graph.
domainstringDomain slug (fallback when domainUri unavailable). Known values: news, biomedical, legal, social-media, dialogue, literary, scientific, web, spoken, custom
licensestringLicense identifier (e.g., 'CC-BY-4.0', 'LDC-User-Agreement').
ontologyRefsarrayOntologies used in this corpus. Array of at-uri
eprintRefsarrayEprint links for this corpus. Array of at-uri
expressionCountintegerNumber of expressions in the corpus.
annotationDesignrefAnnotation project design: annotator assignment, adjudication, and quality criteria. Ref: pub.layers.corpus.defs#annotationDesign
featuresrefRef: pub.layers.defs#featureMap
createdAtdatetimeRecord creation timestamp.

annotationDesign

NSID: pub.layers.corpus.defs#annotationDesign Type: Object

Annotation project design parameters: annotator assignment, adjudication, and quality criteria.

FieldTypeDescription
redundancyrefHow annotators are assigned to items. Ref: pub.layers.corpus.defs#redundancySpec
adjudicationrefHow disagreements are resolved. Ref: pub.layers.corpus.defs#adjudicationSpec
qualityCriteriaarrayAcceptance criteria for annotation quality. Array of ref: pub.layers.corpus.defs#qualityCriterion
guidelinesRefat-uriAT-URI of the annotation guidelines document (e.g., a pub.layers.persona.persona or external resource).
guidelinesVersionstringVersion identifier for the annotation guidelines.
annotationRoundsintegerNumber of annotation passes in the project workflow.
featuresrefRef: pub.layers.defs#featureMap

redundancySpec

NSID: pub.layers.corpus.defs#redundancySpec Type: Object

How many annotators work on each item and how they are assigned.

FieldTypeDescription
countintegerNumber of independent annotators per item.
assignmentStrategyUriat-uriAT-URI of the assignment strategy definition node. Community-expandable via knowledge graph.
assignmentStrategystringHow annotators are assigned to items (fallback when assignmentStrategyUri unavailable). Known values: random, round-robin, stratified, expertise-based, custom
annotatorPoolintegerTotal number of annotators in the project.
featuresrefRef: pub.layers.defs#featureMap

adjudicationSpec

NSID: pub.layers.corpus.defs#adjudicationSpec Type: Object

How disagreements between annotators are resolved into a final annotation.

FieldTypeDescription
methodUriat-uriAT-URI of the adjudication method definition node. Community-expandable via knowledge graph.
methodstringAdjudication method (fallback when methodUri unavailable). Known values: expert, majority-vote, unanimous, discussion, dawid-skene, automatic-merge, intersection, union, none, custom
dedicatedAdjudicatorbooleanWhether a separate adjudicator (not one of the annotators) resolves disagreements.
agreementThresholdintegerAgreement level (0-1000) above which adjudication is skipped.
featuresrefRef: pub.layers.defs#featureMap

qualityCriterion

NSID: pub.layers.corpus.defs#qualityCriterion Type: Object

An acceptance criterion for annotation quality.

FieldTypeDescription
metricUriat-uriAT-URI of the metric definition node. Community-expandable via knowledge graph.
metricstringAgreement or quality metric (fallback when metricUri unavailable). Known values: cohens-kappa, fleiss-kappa, krippendorff-alpha, percent-agreement, f1, smatch, uas, las, correlation, custom
thresholdintegerMinimum acceptable metric value (0-1000).
scopeUriat-uriAT-URI of the evaluation scope definition node. Community-expandable via knowledge graph.
scopestringEvaluation scope (fallback when scopeUri unavailable). Known values: item, layer, document, corpus, custom
featuresrefRef: pub.layers.defs#featureMap

membership

NSID: pub.layers.corpus.membership Type: Record

A record indicating that an expression belongs to a corpus, with optional split assignment.

FieldTypeDescription
corpusRefat-uriAT-URI of the corpus.
expressionRefat-uriAT-URI of the expression.
splitUriat-uriAT-URI of the split definition node. Community-expandable via knowledge graph.
splitstringSplit slug (fallback when splitUri unavailable). Known values: train, dev, test, unlabeled
ordinalintegerOrdering index within the corpus.
metadatarefProvenance: who assigned this expression to this corpus, when, with what tool. Ref: pub.layers.defs#annotationMetadata
featuresrefOpen-ended features for this membership (e.g., source file, import batch, quality flags). Ref: pub.layers.defs#featureMap
createdAtdatetimeRecord creation timestamp.

XRPC Queries

getCorpus

NSID: pub.layers.corpus.getCorpus

Retrieve a single corpus record by AT-URI.

ParameterTypeDescription
uriat-uri (required)The AT-URI of the corpus record.

Output: The corpus record object.

listCorpora

NSID: pub.layers.corpus.listCorpora

List corpus records in a repository with pagination.

ParameterTypeDescription
repodid (required)The DID of the repository.
limitintegerMaximum number of records to return (1-100, default 50).
cursorstringPagination cursor from previous response.

Output: { records: corpus[], cursor?: string }

getMembership

NSID: pub.layers.corpus.getMembership

Retrieve a single membership record by AT-URI.

ParameterTypeDescription
uriat-uri (required)The AT-URI of the membership record.

Output: The membership record object.

listMemberships

NSID: pub.layers.corpus.listMemberships

List membership records in a repository with pagination.

ParameterTypeDescription
repodid (required)The DID of the repository.
limitintegerMaximum number of records to return (1-100, default 50).
cursorstringPagination cursor from previous response.

Output: { records: membership[], cursor?: string }