Skip to main content

Judgment Data

Layers represents the full lifecycle of linguistic judgment data, from experiment definition through stimulus construction, data collection, and agreement analysis. The pub.layers.judgment and pub.layers.resource lexicons work together to support all major judgment paradigms used in psycholinguistics and computational linguistics.

Four Orthogonal Dimensions

Following bead's design, every experiment definition can specify four independent fields:

measureType: What property or behavior is being measured.

taskType: What behavioral response instrument collects explicit responses. Omit for passive paradigms.

presentation: How stimuli are displayed (RSVP, self-paced, whole-sentence, auditory, etc.), including chunking and timing parameters.

recordingMethods: What instruments capture data (keyboard, button-box, EEG, eye-tracking, etc.). An experiment can use multiple simultaneous recording methods.

These are fully independent. An acceptability experiment can use an ordinal scale, a binary yes/no, or a magnitude estimate. The same RSVP presentation can be used with EEG, MEG, or behavioral-only recording. A preference experiment can use forced choice (pairwise comparison), multi-select (best-worst scaling), or ordinal scale (direct ranking). The measure type tells consumers what the data represents; the task type tells them how explicit responses were collected; the presentation tells them how stimuli were shown; the recording methods tell them what instruments captured data.

ParadigmmeasureTypetaskTypepresentation.methodrecordingMethods
Likert-scale acceptability ratingacceptabilityordinal-scalewhole-sentencekeyboard
Binary grammaticality judgmentacceptabilitybinarywhole-sentencebutton-box
EEG + RSVP, passive readingacceptabilityrsvpeeg
EEG + RSVP + button pressacceptabilitybinaryrsvpeeg, button-box
Self-paced readingreading-timeself-pacedkeyboard
Eye-tracking natural readingreading-timewhole-sentenceeye-tracking
EEG + eye-tracking co-registrationreading-timewhole-sentenceeeg, eye-tracking
fMRI + auditory narrativecomprehensionnaturalisticfmri
Visual world paradigmcomprehensionvisual-worldeye-tracking
Masked priming + lexical decision + EEGsimilaritybinarymasked-primingeeg, button-box
Maze taskreading-timeforced-choicemazekeyboard
NLI classificationinferencecategoricalwhole-sentencekeyboard
Pairwise translation comparisonpreferenceforced-choicewhole-sentencemouse-click
Best-worst scaling for sentimentpreferencemulti-selectwhole-sentencemouse-click
Semantic similarity ratingsimilarityordinal-scalewhole-sentencemouse-click
Magnitude estimationacceptabilitymagnitudewhole-sentencekeyboard
Cloze probabilitycomprehensionclozewhole-sentencekeyboard
Named entity span annotationextractionspan-labelingwhole-sentencemouse-click
Sentence completionproductionfree-textwhole-sentencekeyboard
MEG + auditory + passive listeningcomprehensionauditorymeg

Measure Types

The measureType field identifies what property or behavior is being measured.

ValueDescription
acceptabilityLinguistic acceptability, naturalness, or grammaticality
inferenceSemantic relationship (entailment, contradiction, neutral)
similaritySemantic similarity, distance, or relatedness
plausibilityLikelihood or plausibility of events or statements
comprehensionUnderstanding or recall of content
preferenceSubjective preference between alternatives
extractionExtracting structured information (labeled spans) from text
reading-timeProcessing time per word or region (self-paced reading, eye tracking)
productionLanguage production (sentence completion, word generation)

All values are community-expandable via measureTypeUri.

Task Types (Response Instruments)

The taskType field identifies how responses are collected, independent of what is being measured.

Ordinal Scale

Participants rate on a bounded discrete scale (Likert, slider). Responses go in scalarValue.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Naturalness rating study",
"measureType": "acceptability",
"taskType": "ordinal-scale",
"scaleMin": 1,
"scaleMax": 7,
"guidelines": "Rate how natural the sentence sounds on a scale from 1 (very unnatural) to 7 (perfectly natural)..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"scalarValue": 5,
"responseTimeMs": 1560
}

Categorical

Participants choose a label from a fixed set of unordered categories. Responses go in categoricalValue.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Semantic relation classification",
"measureType": "inference",
"taskType": "categorical",
"labels": ["cause", "effect", "precondition", "none"],
"guidelines": "Select the relation that best describes how the two events are connected..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/pair-15" },
"categoricalValue": "cause",
"responseTimeMs": 2340,
"confidence": 800
}

Forced Choice

Participants pick exactly one option from a small set (2AFC, NAFC). Responses go in categoricalValue. This is the instrument behind pairwise comparison experiments.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Translation quality comparison",
"measureType": "preference",
"taskType": "forced-choice",
"labels": ["a", "b", "tie"],
"guidelines": "Which translation better conveys the meaning of the source?"
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/pair-15" },
"categoricalValue": "a",
"responseTimeMs": 4200,
"confidence": 750
}

Multi-Select

Participants pick one or more options from a set. Responses go in behavioralData features. This is the instrument behind best-worst scaling experiments.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Sentiment intensity BWS",
"measureType": "preference",
"taskType": "multi-select",
"guidelines": "Select the sentence that conveys the MOST positive sentiment and the LEAST positive sentiment..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/tuple-8" },
"behavioralData": {
"features": [
{ "key": "best", "value": "item-a" },
{ "key": "worst", "value": "item-c" }
]
},
"responseTimeMs": 5100
}

Binary

Participants give a yes/no response. Responses go in categoricalValue.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Grammaticality judgment",
"measureType": "acceptability",
"taskType": "binary",
"labels": ["yes", "no"],
"guidelines": "Is the following sentence grammatically acceptable?"
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"categoricalValue": "yes",
"responseTimeMs": 980
}

Free Text

Participants provide open-ended text responses. Responses go in freeText.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Paraphrase generation",
"measureType": "similarity",
"taskType": "free-text",
"guidelines": "Write a paraphrase of the following sentence that preserves its meaning..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"freeText": "The mouse was chased by the cat through the garden",
"responseTimeMs": 8450
}

Span Labeling

Participants select and optionally label text regions. Responses go in textSpan.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Argument span identification",
"measureType": "extraction",
"taskType": "span-labeling",
"guidelines": "Highlight the text span that answers: who performed the action?"
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"textSpan": { "byteStart": 0, "byteEnd": 7 },
"responseTimeMs": 3200
}

Magnitude

Participants provide an unbounded numeric value (magnitude estimation). Responses go in scalarValue. Unlike ordinal-scale, there are no scaleMin/scaleMax bounds.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Magnitude estimation of naturalness",
"measureType": "acceptability",
"taskType": "magnitude",
"guidelines": "Assign a number reflecting how natural the sentence sounds. Use any positive number; the reference sentence has a value of 100..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"scalarValue": 230,
"responseTimeMs": 2100
}

Cloze

Participants fill in blanks in a stimulus. Responses go in freeText.

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Cloze probability estimation",
"measureType": "comprehension",
"taskType": "cloze",
"guidelines": "Complete the sentence by filling in the blank with the first word that comes to mind..."
}
{
"item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
"freeText": "dog",
"responseTimeMs": 1200
}

Stimulus Presentation

The presentation field on experimentDef specifies how individual stimuli are displayed to participants. This is independent of what is being measured, what response is collected, and what instruments record data.

Presentation Methods

ValueDescription
rsvpRapid Serial Visual Presentation: chunks shown one at a time at a fixed rate
self-pacedParticipant controls advancement (button press to reveal next chunk)
whole-sentenceEntire stimulus displayed at once
auditorySpoken stimulus (natural speech, synthesized, or rate-controlled)
visual-worldVisual scene displayed alongside auditory stimulus
masked-primingForward mask, brief prime, then target
cross-modalStimulus in one modality, probe in another (e.g., auditory sentence + visual word)
naturalisticExtended narrative (audiobook, story) for continuous processing
gatingIncremental portions of a spoken word revealed in successive gates
mazeTwo-alternative forced choice at each word position
boundaryInvisible boundary triggers parafoveal preview change during saccade
moving-windowOnly a window around fixation is visible; rest is masked

All values are community-expandable via methodUri.

Chunking and Timing

For incremental presentation methods (RSVP, self-paced), additional fields control segmentation and timing:

{
"presentation": {
"method": "rsvp",
"chunkingUnit": "word",
"timingMs": 300,
"isiMs": 200,
"cumulative": false
}
}
{
"presentation": {
"method": "self-paced",
"chunkingUnit": "word",
"cumulative": false,
"maskChar": "-"
}
}
FieldDescription
chunkingUnitHow text is segmented: word, character, morpheme, phrase, sentence, region, custom
timingMsPer-chunk display duration in milliseconds (for timed presentations like RSVP)
isiMsInter-stimulus interval in milliseconds
cumulativeWhether previous chunks remain visible (true for cumulative self-paced reading, false for non-cumulative)
maskCharMasking character replacing hidden text (e.g., - for dashes, # for hashes)
featuresAdditional method-specific parameters (e.g., prime duration for masked priming, gate size for gating)

Recording Methods

The recordingMethods array on experimentDef declares what instruments capture data. An experiment can use multiple simultaneous recording methods (e.g., EEG + eye-tracking co-registration).

Behavioral Input Devices

ValueDescription
button-boxDedicated response box (e.g., Cedrus, PST) with hardware-level timing
keyboardStandard keyboard keypress
mouse-clickMouse button click
touchscreenTouchscreen tap
voiceVoice key or microphone onset detection

Physiological Instruments

ValueDescription
eegElectroencephalography (scalp electrodes, ERPs)
megMagnetoencephalography
fmriFunctional magnetic resonance imaging
fnirsFunctional near-infrared spectroscopy
eye-trackingEye tracker (fixation-based reading measures, visual world)
pupillometryPupil diameter recording
mouse-trackingContinuous mouse cursor trajectory
emgElectromyography
skin-conductanceGalvanic skin response
ecogIntracranial EEG / electrocorticography

All values are community-expandable via methodUri. Detailed acquisition parameters (sample rate, channel count, montage) belong on pub.layers.media.media records, not the experiment definition. See the Psycholinguistic Data guide for media record examples.

Examples

EEG study with RSVP and a behavioral task:

{
"$type": "pub.layers.judgment.experimentDef",
"name": "N400 semantic violation study",
"measureType": "acceptability",
"taskType": "binary",
"presentation": {
"method": "rsvp",
"chunkingUnit": "word",
"timingMs": 300,
"isiMs": 200,
"cumulative": false
},
"recordingMethods": [
{ "method": "eeg" },
{ "method": "button-box" }
],
"labels": ["acceptable", "unacceptable"],
"guidelines": "After each sentence, press the left button if acceptable or the right button if unacceptable..."
}

Eye-tracking natural reading with no explicit task:

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Garden-path reading study",
"measureType": "reading-time",
"presentation": {
"method": "whole-sentence"
},
"recordingMethods": [
{ "method": "eye-tracking" }
],
"guidelines": "Read each sentence silently at your own pace..."
}

Self-paced reading:

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Garden-path SPR study",
"measureType": "reading-time",
"presentation": {
"method": "self-paced",
"chunkingUnit": "word",
"cumulative": false,
"maskChar": "-"
},
"recordingMethods": [
{ "method": "keyboard" }
],
"guidelines": "Press the spacebar to reveal each word..."
}

fMRI with auditory narrative (passive):

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Narrative comprehension fMRI",
"measureType": "comprehension",
"presentation": {
"method": "naturalistic"
},
"recordingMethods": [
{ "method": "fmri" }
],
"guidelines": "Listen to the story and try to understand what is happening..."
}

The Stimulus Pipeline

Layers provides a full pipeline from parameterized templates to materialized stimuli.

Template Definition

A template contains text with {slotName} placeholders, plus slot definitions that constrain what can fill each position:

{
"$type": "pub.layers.resource.template",
"name": "Transitive sentence template",
"text": "{subject} {verb} {object} {adjunct}",
"language": "en",
"slots": [
{
"name": "subject",
"required": true,
"collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/animate-nouns",
"constraints": [{ "expressionFormat": "python-expr", "expression": "self.features.animacy == 'animate'", "scope": "slot" }]
},
{
"name": "verb",
"required": true,
"collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/transitive-verbs"
},
{
"name": "object",
"required": true,
"collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/nouns"
},
{
"name": "adjunct",
"required": false,
"defaultValue": ""
}
],
"constraints": [
{
"expressionFormat": "python-expr",
"expression": "subject.features.number == verb.features.number",
"scope": "template"
}
],
"experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/spr-study"
}

Slot Filling

A filling maps each slot to a specific filler and records the rendered text:

{
"$type": "pub.layers.resource.filling",
"templateRef": "at://did:plc:researcher/pub.layers.resource.template/transitive-template",
"slotFillings": [
{
"slotName": "subject",
"entryRef": "at://did:plc:researcher/pub.layers.resource.entry/the-cat",
"renderedForm": "The cat"
},
{
"slotName": "verb",
"entryRef": "at://did:plc:researcher/pub.layers.resource.entry/chased",
"renderedForm": "chased"
},
{
"slotName": "object",
"entryRef": "at://did:plc:researcher/pub.layers.resource.entry/the-mouse",
"renderedForm": "the mouse"
},
{
"slotName": "adjunct",
"literalValue": "across the garden",
"renderedForm": "across the garden"
}
],
"renderedText": "The cat chased the mouse across the garden",
"strategy": "exhaustive",
"expressionRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42"
}

Template Composition

Multi-part stimuli (context + target + question) use template compositions:

{
"$type": "pub.layers.resource.templateComposition",
"name": "Reading comprehension item",
"compositionType": "sequence",
"members": [
{
"templateRef": "at://did:plc:researcher/pub.layers.resource.template/context-paragraph",
"label": "context",
"ordinal": 0,
"required": true
},
{
"templateRef": "at://did:plc:researcher/pub.layers.resource.template/target-sentence",
"label": "target",
"ordinal": 1,
"required": true
},
{
"templateRef": "at://did:plc:researcher/pub.layers.resource.template/comprehension-question",
"label": "question",
"ordinal": 2,
"required": true
}
],
"experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/spr-study"
}

Materialization

The filling's renderedText becomes an expression's text. The expressionRef on the filling links them bidirectionally. Annotations on the materialized expression can mark labeled spans (e.g., the critical region in a self-paced reading study).

Experiment Design

The experimentDef record specifies the full experimental design, including list construction constraints and item ordering.

Full Example

{
"$type": "pub.layers.judgment.experimentDef",
"name": "Relative clause attachment ambiguity",
"measureType": "acceptability",
"taskType": "ordinal-scale",
"presentation": {
"method": "whole-sentence"
},
"recordingMethods": [
{ "method": "keyboard" }
],
"scaleMin": 1,
"scaleMax": 7,
"guidelines": "Read each sentence and rate how natural it sounds...",
"ontologyRef": "at://did:plc:researcher/pub.layers.ontology.ontology/rc-attachment-types",
"templateRefs": [
"at://did:plc:researcher/pub.layers.resource.template/rc-high-attach",
"at://did:plc:researcher/pub.layers.resource.template/rc-low-attach"
],
"collectionRefs": [
"at://did:plc:researcher/pub.layers.resource.collection/fillers"
],
"design": {
"distributionStrategy": "latin-square",
"itemOrder": "random-order",
"timingMs": 30000,
"listConstraints": [
{
"kind": "latin-square",
"targetProperty": "condition",
"parameters": {
"features": [
{ "key": "numConditions", "value": 4 },
{ "key": "numLists", "value": 4 }
]
}
},
{
"kind": "no-adjacent-same-condition",
"targetProperty": "condition",
"parameters": {
"features": [{ "key": "minDistance", "value": 2 }]
}
},
{
"kind": "balanced-frequency",
"targetProperty": "itemType",
"parameters": {
"features": [
{ "key": "experimental", "value": 24 },
{ "key": "filler", "value": 48 }
]
}
}
]
}
}

List Constraints

Constraint KindPurposeParameters
latin-squareEach participant sees one condition per item; all conditions equally representednumConditions, numLists
no-adjacent-same-conditionPrevent consecutive items from the same conditionminDistance
balanced-frequencyControl the ratio of experimental to filler itemsPer-type counts
min-distanceMinimum distance between items of the same typeminDistance, targetProperty

Distribution and Item Order

FieldKnown ValuesDescription
distributionStrategylatin-square, random, blocked, stratified, customHow items are assigned to participant lists
itemOrderrandom-order, fixed-order, blocked, adaptive, customHow items are ordered within a list
timingMsintegerMaximum time per item in milliseconds

Behavioral Data

Every judgment can carry responseTimeMs and a behavioralData feature map for rich behavioral signals.

Response Times

The responseTimeMs field captures reaction time in milliseconds. For multi-region tasks (self-paced reading), per-region times go in behavioralData:

{
"item": { "recordRef": "at://..." },
"scalarValue": 5,
"responseTimeMs": 1842,
"behavioralData": {
"features": [
{ "key": "region.0.rt", "value": 312 },
{ "key": "region.1.rt", "value": 287 },
{ "key": "region.2.rt", "value": 445 },
{ "key": "region.3.rt", "value": 398 }
]
}
}

Eye-Tracking During Judgment

Eye-tracking data collected during a judgment task can be stored alongside the judgment:

{
"behavioralData": {
"features": [
{ "key": "eyetracking.totalFixationTime", "value": 2340 },
{ "key": "eyetracking.numFixations", "value": 8 },
{ "key": "eyetracking.numRegressions", "value": 2 },
{ "key": "eyetracking.firstFixationDuration", "value": 245 },
{ "key": "eyetracking.gazeDataRef", "value": "at://did:plc:researcher/pub.layers.media.media/et-session-017" }
]
}
}

For detailed fixation-level data, see the Psycholinguistic Data guide.

Mouse and Keystroke Tracking

Interaction data captured during web-based experiments:

{
"behavioralData": {
"features": [
{ "key": "mouse.numClicks", "value": 1 },
{ "key": "mouse.trajectory", "value": "[[0,400],[50,380],[120,350],[200,320]]" },
{ "key": "mouse.maxDeviation", "value": 45 },
{ "key": "keystroke.numBackspaces", "value": 3 },
{ "key": "keystroke.typingSpeed", "value": 42 }
]
}
}

Agreement Analysis

The agreementReport record summarizes inter-annotator agreement across judgment sets.

Example: Cohen's Kappa

{
"$type": "pub.layers.judgment.agreementReport",
"experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/ner-study",
"judgmentSetRefs": [
"at://did:plc:annotator1/pub.layers.judgment.judgmentSet/ner-batch1",
"at://did:plc:annotator2/pub.layers.judgment.judgmentSet/ner-batch1"
],
"metric": "cohens-kappa",
"value": 8200,
"numAnnotators": 2,
"numItems": 500
}

Example: Krippendorff's Alpha

{
"$type": "pub.layers.judgment.agreementReport",
"experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/naturalness-study",
"judgmentSetRefs": [
"at://did:plc:annotator1/pub.layers.judgment.judgmentSet/nat-batch1",
"at://did:plc:annotator2/pub.layers.judgment.judgmentSet/nat-batch1",
"at://did:plc:annotator3/pub.layers.judgment.judgmentSet/nat-batch1"
],
"metric": "krippendorff-alpha",
"value": 7100,
"numAnnotators": 3,
"numItems": 200
}

Metrics

MetricUse CaseScale
cohens-kappaTwo annotators, categorical data0-1000 (maps to 0.0-1.0)
fleiss-kappaMultiple annotators, categorical data0-1000
krippendorff-alphaAny number of annotators, any scale type0-1000
percent-agreementSimple agreement percentage0-1000
correlationOrdinal-scale judgments0-1000 (maps to 0.0-1.0)
f1Span labeling overlap0-1000

All metric values use the 0-1000 integer scale for consistent representation without floating-point issues. The metricUri field allows community-defined metrics beyond these known values.

See Also

  • Primitives: objectRef, featureMap, agentRef, constraint definitions
  • Psycholinguistic Data: neural and physiological data integration with experiments
  • Judgment: full lexicon reference for experimentDef, judgment, judgmentSet, agreementReport
  • Resource: template, slot, filling, collection, and entry definitions
  • Expression: materialized stimulus expressions
  • Annotation: annotation layers for derived measures on stimuli
  • Flexible Enums: extending measureType, taskType, presentation method, recording method, and other values
  • bead Integration: mapping from the bead framework
  • Decomp Integration: mapping from UDS scalar judgments