Skip to main content

FoLiA (Format for Linguistic Annotation)

Overview

FoLiA is a rich XML-based format for linguistic annotation developed at Radboud University. It supports both inline and stand-off annotation, alternative/correction layers, provenance tracking, and a wide range of linguistic annotation types. FoLiA aims to be a single format that handles all levels of linguistic annotation in a consistent way.

Type-by-Type Mapping

Document Structure

FoLiA ElementLayers EquivalentNotes
<FoLiA> (document)pub.layers.expressionRoot document. FoLiA's @xml:id → AT-URI.
<metadata>Expression fields + featuresDocument-level metadata.
<text>pub.layers.expression.textPrimary text content.
<div> (division)pub.layers.expression (kind: section)Document divisions.
<p> (paragraph)pub.layers.expression (kind: section) with subkind="paragraph"Paragraph sections.
<s> (sentence)pub.layers.expression (kind: sentence)Sentence boundaries.
<w> (word)pub.layers.expression (kind: token)Word tokens.
<morpheme>pub.layers.expression (kind: token) in a tokenization(kind="morphological")Morpheme-level tokens within a word, linked by pub.layers.alignment.

Annotation Layers

FoLiA AnnotationLayers EquivalentNotes
<pos>annotationLayer(kind="token-tag", subkind="pos")POS tagging. FoLiA's @classannotation.label; @setannotationLayer.labelSet.
<lemma>annotationLayer(kind="token-tag", subkind="lemma")Lemmatization.
<sense>annotationLayer(kind="token-tag", subkind="sense")Word sense disambiguation. @classannotation.label; @synsetannotation.knowledgeRefs (WordNet).
<domain>annotationLayer(kind="token-tag") with custom subkindDomain/register classification.
<lang>annotationLayer(kind="token-tag", subkind="language-id")Per-token language identification.
<phonology> / <phon>annotationLayer(kind="token-tag", subkind="phonetic")Phonetic transcription.
<entity>annotationLayer(kind="span", subkind="entity-mention")Named entity spans. FoLiA's @classannotation.label (entity type).
<chunking> / <chunk>annotationLayer(kind="token-tag", subkind="chunk") or annotationLayer(kind="span")Chunking. IOB token tags or span annotations.
<dependency>annotationLayer(kind="graph", subkind="dependency")Dependency relations. <dep> → dependent token; <hd> → head token; @class → relation label.
<syntax>annotationLayer(kind="tree", subkind="constituency")Constituency parse. <su> (syntactic unit) → annotation with parentId/childIds.
<semroles>annotationLayer(kind="span", subkind="frame")Semantic role labeling. <semrole @class>argumentRef.role.
<coreferences>pub.layers.annotation#clusterSet with kind="coreference"Coreference chains.
<sentiment>annotationLayer(kind="span", subkind="sentiment")Sentiment annotation.
<statement>annotationLayer(kind="span") with custom subkind (e.g., "attribution")Attribution/statement annotation.
<observation>annotationLayer(kind="span") with custom subkindObservation annotations.
<timesegment>annotationLayer(kind="tier") with anchor.temporalSpanTime-aligned segments for speech.
<rawcontent>pub.layers.expression.text or featuresRaw/original content before normalization.

Corrections and Alternatives

FoLiA has a sophisticated system for representing corrections, alternatives, and suggestions:

FoLiA FeatureLayers EquivalentNotes
<correction>annotationLayer(kind="span", subkind="correction") with parentLayerRef → error layerCorrections linked to the original annotation via parentLayerRef. The original is in an error layer; the correction in a correction layer.
<original> (within correction)The annotation in the error-subkind layerThe original (incorrect) form.
<new> (within correction)The annotation in the correction-subkind layerThe corrected form.
<current>Active annotation (default layer)The current/accepted form.
<suggestion>Additional annotation with lower confidenceSuggested alternatives stored as annotations with confidence scores.
<alternative>Separate annotationLayer with same kind/subkindAlternative analyses (e.g., two possible POS tags). Each alternative is a separate annotation layer. Multiple layers with the same kind/subkind on the same expression represent alternatives.

Provenance and Sets

FoLiA FeatureLayers EquivalentNotes
<processor>pub.layers.defs#annotationMetadata.toolTool/annotator identification.
@processor (on annotations)annotationLayer.metadata.toolPer-layer tool attribution.
@set (annotation set)annotationLayer.labelSet or ontologyRefTagset/label-set identifier. FoLiA sets define valid label values. Layers uses ontologyRef for formal type systems and labelSet for simpler tag sets.
@class (annotation value)annotation.labelThe annotation value from the set.
@confidenceannotation.confidenceConfidence score (0-10000 scaled from FoLiA's 0.0-1.0).
@datetimeannotationMetadata.timestampCreation timestamp.
@n (ordinal)annotation.features.ordinal or array indexOrdering information.

Span Annotation and Grouping

FoLiA FeatureLayers EquivalentNotes
<wref> (word reference)anchor.tokenRefSequenceReferences to words in span annotations. FoLiA's @id-based references → Layers token index references.
Discontinuous spansanchor.tokenRefSequence with non-contiguous tokenIndexesFoLiA supports discontinuous spans via multiple <wref> elements. Layers's tokenRefSequence supports the same pattern.
<relation>pub.layers.graph#graphEdge or annotation.argumentsTyped relations between annotation elements.