Skip to main content

pub.layers.segmentation

A segmentation record that binds a hierarchical document structure (sections containing sentences containing tokenizations) to an expression. Multiple segmentations can coexist for the same expression.

Types

main

Type: Record

A complete segmentation of an expression into sections, sentences, and tokenizations.

FieldTypeDescription
expressionat-uriReference to the expression this segmentation applies to.
sectionsarrayThe ordered sections of the expression. Array of ref: #sectionWithSentences
metadatarefRef: pub.layers.defs#annotationMetadata
knowledgeRefsarrayKnowledge graph references (e.g., segmentation algorithm, sentence splitting model). Array of ref: pub.layers.defs#knowledgeRef
featuresrefOpen-ended features (e.g., segmenter version, parameters, language model used). Ref: pub.layers.defs#featureMap
createdAtdatetimeRecord creation timestamp.

sectionWithSentences

Type: Object

A section paired with its constituent sentences and their tokenizations.

FieldTypeDescription
sectionrefRef: pub.layers.expression#section. Section, sentence, token, and tokenization are kind values on Expression records.
sentencesarraySentences within this section. Array of ref: #sentenceWithTokenizations

sentenceWithTokenizations

Type: Object

A sentence paired with one or more tokenizations. Multiple tokenizations support interlinear glossing (word-level + morpheme-level), alternative segmentation strategies, or multi-granularity analysis. Use pub.layers.alignment to map between tokenizations.

FieldTypeDescription
sentencerefRef: pub.layers.expression#sentence
tokenizationsarrayOne or more tokenizations of this sentence. The first is conventionally the primary/word-level tokenization. Array of ref: pub.layers.expression#tokenization