Skip to main content

TEI (Text Encoding Initiative)

Overview

TEI is an XML-based standard for encoding literary and linguistic texts. It provides extremely rich document structure modeling (divisions, paragraphs, lines, speakers, stage directions), critical apparatus for manuscript traditions, metadata headers, and both inline and stand-off annotation. TEI is the dominant format in digital humanities and historical linguistics.

Type-by-Type Mapping

Document Structure

TEI ElementLayers EquivalentNotes
<TEI>pub.layers.expression (record)Root document container. TEI's @xml:id maps to the record's AT-URI.
<teiHeader>pub.layers.expression metadata fields + featuresTEI header metadata (fileDesc, encodingDesc, profileDesc, revisionDesc) maps to expression-level fields and features.
<text>pub.layers.expression.textPrimary textual content.
<body>Implicit in pub.layers.segmentationThe body structure is captured by the segmentation hierarchy.
<div> (division)pub.layers.expression (kind: section)Nested divisions map to sections with kind discriminating division types (chapter, part, act, scene, etc.). TEI's nested <div> structure is flattened into sections with parent-child relationships tracked via features or section ordering.
<p> (paragraph)pub.layers.expression (kind: section) with subkind="paragraph"Paragraph-level sections.
<s> (sentence)pub.layers.expression (kind: sentence)Sentence segmentation.
<w> (word)pub.layers.expression (kind: token)Word-level tokenization. TEI's @lemma, @pos, @msd attributes map to separate annotation layers.
<c> (character)pub.layers.expression (kind: token) in a tokenization(kind="character")Character-level tokenization.
<pc> (punctuation)pub.layers.expression (kind: token) with feature isPunctuation=truePunctuation characters as tokens.

Speaker and Dialogue

TEI ElementLayers EquivalentNotes
<sp> (speech)pub.layers.expression (kind: section) with subkind="turn" + speakerDialogue turns with speaker identification.
<speaker>expression.speakerSpeaker identifier.
<stage> (stage direction)pub.layers.expression (kind: section) with subkind set via subkindUri (e.g., "stage-direction")Stage directions as community-defined section kinds.
<lg> (line group)pub.layers.expression (kind: section) with subkind via subkindUri (e.g., "stanza")Poetic stanzas.
<l> (verse line)pub.layers.expression (kind: section) with subkind via subkindUri (e.g., "verse-line")Individual verse lines.

Linguistic Annotation

TEI ElementLayers EquivalentNotes
<w @lemma>annotationLayer(kind="token-tag", subkind="lemma")Lemmatization.
<w @pos>annotationLayer(kind="token-tag", subkind="pos")POS tagging.
<w @msd>annotationLayer(kind="token-tag", subkind="morph")Morphosyntactic description.
<phr> (phrase)annotationLayer(kind="span") with appropriate subkindPhrase-level annotation.
<cl> (clause)annotationLayer(kind="span", subkind="discourse-unit")Clause annotation.
<name> / <persName> / <placeName> / <orgName>annotationLayer(kind="span", subkind="entity-mention")Named entity spans. TEI's entity type (persName vs placeName) maps to annotation.label.
<rs> (referring string)annotationLayer(kind="span", subkind="entity-mention")Referring expressions with @typeannotation.label.
<date> / <time>annotationLayer(kind="span", subkind="temporal-expression")Temporal expressions with @whenannotation.value (normalized form).

Critical Apparatus and Manuscript Traditions

TEI ElementLayers EquivalentNotes
<app> (apparatus entry)annotationLayer(kind="span") with subkind via subkindUri (e.g., "apparatus-entry")Variant readings. Each <rdg> (reading) is an annotation with the variant text in value and witness in features.
<rdg> (reading)pub.layers.annotation#annotationIndividual manuscript readings. @wit (witness sigla) → features.
<lem> (lemma/preferred reading)annotation with feature isLemma=trueThe preferred reading among variants.
<note> (editorial note)annotationLayer(kind="span", subkind="comment")Notes and commentary.
<gap> / <unclear> / <supplied>annotationLayer(kind="span") with custom subkindTranscription uncertainty markers.
<choice> / <sic> / <corr>annotationLayer(kind="span", subkind="error") + annotationLayer(kind="span", subkind="correction") with parentLayerRefError/correction pairs linked by parentLayerRef.
<abbr> / <expan>Similar error/correction patternAbbreviation/expansion pairs.

Metadata and Bibliography

TEI ElementLayers EquivalentNotes
<fileDesc>pub.layers.expression fields + featuresFile description metadata.
<sourceDesc>pub.layers.expression.sourceUrl + sourceRefSource document references.
<bibl> / <biblStruct>pub.layers.eprintBibliographic references link to eprint records.
<respStmt>pub.layers.defs#annotationMetadataResponsibility statements map to annotation metadata.
<encodingDesc>pub.layers.ontology + annotation layer metadataEncoding description (tagset declarations, etc.) maps to ontology definitions.
<taxonomy> / <category>pub.layers.ontology#typeDef hierarchyClassification taxonomies.
<particDesc> / <person>pub.layers.persona or featuresParticipant descriptions in spoken text corpora.

Stand-Off Annotation (TEI)

TEI supports stand-off annotation via @xml:id references and <spanGrp>/<span> elements, which map directly to Layers's stand-off architecture:

TEI Stand-Off ElementLayers EquivalentNotes
<spanGrp>pub.layers.annotation#annotationLayerGroup of stand-off spans.
<span @from @to>pub.layers.annotation#annotation with anchor.textSpanStand-off span with character offsets.
<link> / <linkGrp>pub.layers.graph#graphEdgeTyped links between elements.
<interp> / <interpGrp>pub.layers.annotation#annotationLayerInterpretive annotation groups.