NIF

Model: NLP Interchange Format (NIF)
Origin: AKSW, University of Leipzig
Specification: NIF 2.0 Core Ontology
Key Reference: Hellmann et al. 2013

Overview

NIF is an RDF/Linked Data-based format for representing NLP annotations as web resources. Every string in a document gets a URI based on its character offsets, making annotations dereferenceable web resources. NIF uses OWL/RDF ontologies for type definitions and integrates with the Linked Data ecosystem (DBpedia, Wikidata, etc.).

Type-by-Type Mapping

Core NIF Ontology

NIF Class/Property	Layers Equivalent	Notes
`nif:Context`	`pub.layers.expression.expression`	The document/text being annotated. NIF's `nif:isString` → `expression.text`.
`nif:String` (base class)	`pub.layers.annotation.defs#annotation`	Any annotated substring. NIF identifies strings by URI (offset-based); Layers uses UUID + character span anchoring.
`nif:RFC5147String` / `nif:OffsetBasedString`	`pub.layers.defs#span`	Offset-based string identification. NIF's `nif:beginIndex`/`nif:endIndex` → `span.byteStart`/`span.byteEnd`. The import pipeline converts character offsets to UTF-8 byte offsets at import time.
`nif:Word`	`pub.layers.expression.expression` (kind: `token`)	Word-level string.
`nif:Sentence`	`pub.layers.expression.expression` (kind: `sentence`)	Sentence-level string.
`nif:Phrase`	`pub.layers.annotation.defs#annotation` with `kind="span"`	Phrase/constituent annotation.
`nif:Title` / `nif:Paragraph`	`pub.layers.expression.expression` (kind: `section`)	Document structure elements.

NIF Annotation Properties

NIF Property	Layers Equivalent	Notes
`nif:anchorOf`	`token.text` or derived from `expression.text` + offsets	The surface string.
`nif:beginIndex` / `nif:endIndex`	`anchor.textSpan.byteStart` / `anchor.textSpan.byteEnd`	UTF-8 byte offsets (0-based, exclusive end). The import pipeline converts character offsets to byte offsets at import time.
`nif:referenceContext`	Annotation layer's `expression` reference	The document context.
`nif:posTag`	`annotationLayer(kind="token-tag", subkind="pos")` → `annotation.label`	POS tag.
`nif:lemma`	`annotationLayer(kind="token-tag", subkind="lemma")` → `annotation.value`	Lemma.
`nif:stem`	`annotationLayer(kind="token-tag")` → `annotation.value` with custom subkind	Stem form.
`nif:oliaLink`	`annotation.ontologyTypeRef` or `annotation.knowledgeRefs`	Link to OLiA (Ontologies of Linguistic Annotation) category.
`nif:dependency`	`annotationLayer(kind="graph", subkind="dependency")`	Dependency relation.
`nif:sentimentValue`	`annotationLayer(kind="span", subkind="sentiment")` → `annotation.features.sentimentValue`	Sentiment score.
`nif:topic`	`annotation.label` or `features.topic`	Topic classification.

NIF Linked Data Integration

NIF Feature	Layers Equivalent	Notes
`itsrdf:taIdentRef` (entity linking)	`annotation.knowledgeRefs` with `source="wikidata"` or `source="dbpedia"`	Entity linking to Linked Data resources. NIF's DBpedia/Wikidata URIs → Layers `knowledgeRef.uri` + `knowledgeRef.identifier`.
`itsrdf:taClassRef`	`annotation.ontologyTypeRef` + `knowledgeRefs`	Entity type from an ontology.
`itsrdf:taConfidence`	`annotation.confidence`	Entity linking confidence (scaled 0-1000).
OLiA ontology references	`annotationLayer.ontologyRef` or `annotation.knowledgeRefs`	Links to the Ontologies of Linguistic Annotation for tagset normalization.
`nif:sourceUrl`	`pub.layers.expression.sourceUrl`	Source document URL.

NIF Provenance

NIF Feature	Layers Equivalent	Notes
`nif:broaderContext`	`annotationLayer.parentLayerRef` or expression hierarchy	Context nesting.
`prov:wasGeneratedBy` (PROV-O)	`pub.layers.defs#annotationMetadata.tool`	Provenance tracking.
`prov:wasAttributedTo`	`annotationMetadata.personaRef` or `annotationMetadata.tool`	Attribution.
`dcterms:created`	`annotationMetadata.timestamp`	Creation time.

URI Scheme Comparison

NIF's key innovation is URI-based identification of text strings:

# NIF URI for characters 0-5 of document:
http://example.org/doc1#char=0,5

# Layers equivalent:
{
  "anchor": {
    "textSpan": { "byteStart": 0, "byteEnd": 5 }
  }
}

NIF uses HTTP URIs, making every annotation a dereferenceable web resource. Layers uses AT-URIs (at://did:plc:xxx/pub.layers.annotation.annotationLayer/tid), which serve the same purpose in the ATProto ecosystem: every annotation is a dereferenceable resource via its AT-URI.

Overview​

Type-by-Type Mapping​

Core NIF Ontology​

NIF Annotation Properties​

NIF Linked Data Integration​

NIF Provenance​

URI Scheme Comparison​