Skip to main content

NIF (NLP Interchange Format)

Overview

NIF is an RDF/Linked Data-based format for representing NLP annotations as web resources. Every string in a document gets a URI based on its character offsets, making annotations dereferenceable web resources. NIF uses OWL/RDF ontologies for type definitions and integrates with the Linked Data ecosystem (DBpedia, Wikidata, etc.).

Type-by-Type Mapping

Core NIF Ontology

NIF Class/PropertyLayers EquivalentNotes
nif:Contextpub.layers.expressionThe document/text being annotated. NIF's nif:isStringexpression.text.
nif:String (base class)pub.layers.annotation#annotationAny annotated substring. NIF identifies strings by URI (offset-based); Layers uses UUID + character span anchoring.
nif:RFC5147String / nif:OffsetBasedStringpub.layers.defs#spanCharacter offset-based string identification. NIF's nif:beginIndex/nif:endIndexspan.start/span.ending.
nif:Wordpub.layers.expression (kind: token)Word-level string.
nif:Sentencepub.layers.expression (kind: sentence)Sentence-level string.
nif:Phrasepub.layers.annotation#annotation with kind="span"Phrase/constituent annotation.
nif:Title / nif:Paragraphpub.layers.expression (kind: section)Document structure elements.

NIF Annotation Properties

NIF PropertyLayers EquivalentNotes
nif:anchorOftoken.text or derived from expression.text + offsetsThe surface string.
nif:beginIndex / nif:endIndexanchor.textSpan.start / anchor.textSpan.endingCharacter offsets (0-based, exclusive end).
nif:referenceContextAnnotation layer's expression referenceThe document context.
nif:posTagannotationLayer(kind="token-tag", subkind="pos")annotation.labelPOS tag.
nif:lemmaannotationLayer(kind="token-tag", subkind="lemma")annotation.valueLemma.
nif:stemannotationLayer(kind="token-tag")annotation.value with custom subkindStem form.
nif:oliaLinkannotation.ontologyTypeRef or annotation.knowledgeRefsLink to OLiA (Ontologies of Linguistic Annotation) category.
nif:dependencyannotationLayer(kind="graph", subkind="dependency")Dependency relation.
nif:sentimentValueannotationLayer(kind="span", subkind="sentiment")annotation.features.sentimentValueSentiment score.
nif:topicannotation.label or features.topicTopic classification.

NIF Linked Data Integration

NIF FeatureLayers EquivalentNotes
itsrdf:taIdentRef (entity linking)annotation.knowledgeRefs with source="wikidata" or source="dbpedia"Entity linking to Linked Data resources. NIF's DBpedia/Wikidata URIs → Layers knowledgeRef.uri + knowledgeRef.identifier.
itsrdf:taClassRefannotation.ontologyTypeRef + knowledgeRefsEntity type from an ontology.
itsrdf:taConfidenceannotation.confidenceEntity linking confidence (scaled 0-10000).
OLiA ontology referencesannotationLayer.ontologyRef or annotation.knowledgeRefsLinks to the Ontologies of Linguistic Annotation for tagset normalization.
nif:sourceUrlpub.layers.expression.sourceUrlSource document URL.

NIF Provenance

NIF FeatureLayers EquivalentNotes
nif:broaderContextannotationLayer.parentLayerRef or expression hierarchyContext nesting.
prov:wasGeneratedBy (PROV-O)pub.layers.defs#annotationMetadata.toolProvenance tracking.
prov:wasAttributedToannotationMetadata.personaRef or annotationMetadata.toolAttribution.
dcterms:createdannotationMetadata.timestampCreation time.

URI Scheme Comparison

NIF's key innovation is URI-based identification of text strings:

# NIF URI for characters 0-5 of document:
http://example.org/doc1#char=0,5

# Layers equivalent:
{
"anchor": {
"textSpan": { "start": 0, "ending": 5 }
}
}

NIF uses HTTP URIs, making every annotation a dereferenceable web resource. Layers uses AT-URIs (at://did:plc:xxx/pub.layers.annotation/tid), which serve the same purpose in the ATProto ecosystem — every annotation is a dereferenceable resource via its AT-URI.