FoLiA

Model: Format for Linguistic Annotation (FoLiA)
Origin: Radboud University
Specification: FoLiA Documentation v2.5+
Key Reference: van Gompel & Reynaert 2013

Overview

FoLiA is a rich XML-based format for linguistic annotation developed at Radboud University. It supports both inline and stand-off annotation, alternative/correction layers, provenance tracking, and a wide range of linguistic annotation types. FoLiA aims to be a single format that handles all levels of linguistic annotation in a consistent way.

Type-by-Type Mapping

Document Structure

FoLiA Element	Layers Equivalent	Notes
`<FoLiA>` (document)	`pub.layers.expression.expression`	Root document. FoLiA's `@xml:id` → AT-URI.
`<metadata>`	Expression fields + `features`	Document-level metadata.
`<text>`	`pub.layers.expression.text`	Primary text content.
`<div>` (division)	`pub.layers.expression.expression` (kind: `section`)	Document divisions.
`<p>` (paragraph)	`pub.layers.expression.expression` (kind: `section`) with `subkind="paragraph"`	Paragraph sections.
`<s>` (sentence)	`pub.layers.expression.expression` (kind: `sentence`)	Sentence boundaries.
`<w>` (word)	`pub.layers.expression.expression` (kind: `token`)	Word tokens.
`<morpheme>`	`pub.layers.expression.expression` (kind: `token`) in a `tokenization(kind="morphological")`	Morpheme-level tokens within a word, linked by `pub.layers.alignment.alignment`.

Annotation Layers

FoLiA Annotation	Layers Equivalent	Notes
`<pos>`	`annotationLayer(kind="token-tag", subkind="pos")`	POS tagging. FoLiA's `@class` → `annotation.label`; `@set` → `annotationLayer.labelSet`.
`<lemma>`	`annotationLayer(kind="token-tag", subkind="lemma")`	Lemmatization.
`<sense>`	`annotationLayer(kind="token-tag", subkind="sense")`	Word sense disambiguation. `@class` → `annotation.label`; `@synset` → `annotation.knowledgeRefs` (WordNet).
`<domain>`	`annotationLayer(kind="token-tag")` with custom `subkind`	Domain/register classification.
`<lang>`	`annotationLayer(kind="token-tag", subkind="language-id")`	Per-token language identification.
`<phonology>` / `<phon>`	`annotationLayer(kind="token-tag", subkind="phonetic")`	Phonetic transcription.
`<entity>`	`annotationLayer(kind="span", subkind="entity-mention")`	Named entity spans. FoLiA's `@class` → `annotation.label` (entity type).
`<chunking>` / `<chunk>`	`annotationLayer(kind="token-tag", subkind="chunk")` or `annotationLayer(kind="span")`	Chunking. IOB token tags or span annotations.
`<dependency>`	`annotationLayer(kind="graph", subkind="dependency")`	Dependency relations. `<dep>` → dependent token; `<hd>` → head token; `@class` → relation label.
`<syntax>`	`annotationLayer(kind="tree", subkind="constituency")`	Constituency parse. `<su>` (syntactic unit) → annotation with `parentId`/`childIds`.
`<semroles>`	`annotationLayer(kind="span", subkind="frame")`	Semantic role labeling. `<semrole @class>` → `argumentRef.role`.
`<coreferences>`	`pub.layers.annotation.clusterSet` with `kind="coreference"`	Coreference chains.
`<sentiment>`	`annotationLayer(kind="span", subkind="sentiment")`	Sentiment annotation.
`<statement>`	`annotationLayer(kind="span")` with custom `subkind` (e.g., `"attribution"`)	Attribution/statement annotation.
`<observation>`	`annotationLayer(kind="span")` with custom `subkind`	Observation annotations.
`<timesegment>`	`annotationLayer(kind="tier")` with `anchor.temporalSpan`	Time-aligned segments for speech.
`<rawcontent>`	`pub.layers.expression.text` or `features`	Raw/original content before normalization.

Corrections and Alternatives

FoLiA has a sophisticated system for representing corrections, alternatives, and suggestions:

FoLiA Feature	Layers Equivalent	Notes
`<correction>`	`annotationLayer(kind="span", subkind="correction")` with `parentLayerRef` → error layer	Corrections linked to the original annotation via `parentLayerRef`. The original is in an error layer; the correction in a correction layer.
`<original>` (within correction)	The annotation in the error-subkind layer	The original (incorrect) form.
`<new>` (within correction)	The annotation in the correction-subkind layer	The corrected form.
`<current>`	Active annotation (default layer)	The current/accepted form.
`<suggestion>`	Additional annotation with lower confidence	Suggested alternatives stored as annotations with confidence scores.
`<alternative>`	Separate `annotationLayer` with same `kind`/`subkind`	Alternative analyses (e.g., two possible POS tags). Each alternative is a separate annotation layer. Multiple layers with the same `kind`/`subkind` on the same expression represent alternatives.

Provenance and Sets

FoLiA Feature	Layers Equivalent	Notes
`<processor>`	`pub.layers.defs#annotationMetadata.tool`	Tool/annotator identification.
`@processor` (on annotations)	`annotationLayer.metadata.tool`	Per-layer tool attribution.
`@set` (annotation set)	`annotationLayer.labelSet` or `ontologyRef`	Tagset/label-set identifier. FoLiA sets define valid label values. Layers uses `ontologyRef` for formal type systems and `labelSet` for simpler tag sets.
`@class` (annotation value)	`annotation.label`	The annotation value from the set.
`@confidence`	`annotation.confidence`	Confidence score (0-1000 scaled from FoLiA's 0.0-1.0).
`@datetime`	`annotationMetadata.timestamp`	Creation timestamp.
`@n` (ordinal)	`annotation.features.ordinal` or array index	Ordering information.

Span Annotation and Grouping

FoLiA Feature	Layers Equivalent	Notes
`<wref>` (word reference)	`anchor.tokenRefSequence`	References to words in span annotations. FoLiA's `@id`-based references → Layers token index references.
Discontinuous spans	`anchor.tokenRefSequence` with non-contiguous `tokenIndexes`	FoLiA supports discontinuous spans via multiple `<wref>` elements. Layers's `tokenRefSequence` supports the same pattern.
`<relation>`	`pub.layers.graph.graphEdge` or `annotation.arguments`	Typed relations between annotation elements.

Overview​

Type-by-Type Mapping​

Document Structure​

Annotation Layers​

Corrections and Alternatives​

Provenance and Sets​

Span Annotation and Grouping​