Judgment Data

Layers represents the full lifecycle of linguistic judgment data, from experiment definition through stimulus construction, data collection, and agreement analysis. The pub.layers.judgment and pub.layers.resource lexicons work together to support all major judgment paradigms used in psycholinguistics and computational linguistics.

Four Orthogonal Dimensions

Following bead's design, every experiment definition can specify four independent fields:

measureType: What property or behavior is being measured.

taskType: What behavioral response instrument collects explicit responses. Omit for passive paradigms.

presentation: How stimuli are displayed (RSVP, self-paced, whole-sentence, auditory, etc.), including chunking and timing parameters.

recordingMethods: What instruments capture data (keyboard, button-box, EEG, eye-tracking, etc.). An experiment can use multiple simultaneous recording methods.

These are fully independent. An acceptability experiment can use an ordinal scale, a binary yes/no, or a magnitude estimate. The same RSVP presentation can be used with EEG, MEG, or behavioral-only recording. A preference experiment can use forced choice (pairwise comparison), multi-select (best-worst scaling), or ordinal scale (direct ranking). The measure type tells consumers what the data represents; the task type tells them how explicit responses were collected; the presentation tells them how stimuli were shown; the recording methods tell them what instruments captured data.

Paradigm	`measureType`	`taskType`	`presentation.method`	`recordingMethods`
Likert-scale acceptability rating	`acceptability`	`ordinal-scale`	`whole-sentence`	`keyboard`
Binary grammaticality judgment	`acceptability`	`binary`	`whole-sentence`	`button-box`
EEG + RSVP, passive reading	`acceptability`		`rsvp`	`eeg`
EEG + RSVP + button press	`acceptability`	`binary`	`rsvp`	`eeg`, `button-box`
Self-paced reading	`reading-time`		`self-paced`	`keyboard`
Eye-tracking natural reading	`reading-time`		`whole-sentence`	`eye-tracking`
EEG + eye-tracking co-registration	`reading-time`		`whole-sentence`	`eeg`, `eye-tracking`
fMRI + auditory narrative	`comprehension`		`naturalistic`	`fmri`
Visual world paradigm	`comprehension`		`visual-world`	`eye-tracking`
Masked priming + lexical decision + EEG	`similarity`	`binary`	`masked-priming`	`eeg`, `button-box`
Maze task	`reading-time`	`forced-choice`	`maze`	`keyboard`
NLI classification	`inference`	`categorical`	`whole-sentence`	`keyboard`
Pairwise translation comparison	`preference`	`forced-choice`	`whole-sentence`	`mouse-click`
Best-worst scaling for sentiment	`preference`	`multi-select`	`whole-sentence`	`mouse-click`
Semantic similarity rating	`similarity`	`ordinal-scale`	`whole-sentence`	`mouse-click`
Magnitude estimation	`acceptability`	`magnitude`	`whole-sentence`	`keyboard`
Cloze probability	`comprehension`	`cloze`	`whole-sentence`	`keyboard`
Named entity span annotation	`extraction`	`span-labeling`	`whole-sentence`	`mouse-click`
Sentence completion	`production`	`free-text`	`whole-sentence`	`keyboard`
MEG + auditory + passive listening	`comprehension`		`auditory`	`meg`

Measure Types

The measureType field identifies what property or behavior is being measured.

Value	Description
`acceptability`	Linguistic acceptability, naturalness, or grammaticality
`inference`	Semantic relationship (entailment, contradiction, neutral)
`similarity`	Semantic similarity, distance, or relatedness
`plausibility`	Likelihood or plausibility of events or statements
`comprehension`	Understanding or recall of content
`preference`	Subjective preference between alternatives
`extraction`	Extracting structured information (labeled spans) from text
`reading-time`	Processing time per word or region (self-paced reading, eye tracking)
`production`	Language production (sentence completion, word generation)

All values are community-expandable via measureTypeUri.

Task Types (Response Instruments)

The taskType field identifies how responses are collected, independent of what is being measured.

Ordinal Scale

Participants rate on a bounded discrete scale (Likert, slider). Responses go in scalarValue.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Naturalness rating study",
  "measureType": "acceptability",
  "taskType": "ordinal-scale",
  "scaleMin": 1,
  "scaleMax": 7,
  "guidelines": "Rate how natural the sentence sounds on a scale from 1 (very unnatural) to 7 (perfectly natural)..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "scalarValue": 5,
  "responseTimeMs": 1560
}

Categorical

Participants choose a label from a fixed set of unordered categories. Responses go in categoricalValue.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Semantic relation classification",
  "measureType": "inference",
  "taskType": "categorical",
  "labels": ["cause", "effect", "precondition", "none"],
  "guidelines": "Select the relation that best describes how the two events are connected..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/pair-15" },
  "categoricalValue": "cause",
  "responseTimeMs": 2340,
  "confidence": 800
}

Forced Choice

Participants pick exactly one option from a small set (2AFC, NAFC). Responses go in categoricalValue. This is the instrument behind pairwise comparison experiments.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Translation quality comparison",
  "measureType": "preference",
  "taskType": "forced-choice",
  "labels": ["a", "b", "tie"],
  "guidelines": "Which translation better conveys the meaning of the source?"
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/pair-15" },
  "categoricalValue": "a",
  "responseTimeMs": 4200,
  "confidence": 750
}

Multi-Select

Participants pick one or more options from a set. Responses go in behavioralData features. This is the instrument behind best-worst scaling experiments.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Sentiment intensity BWS",
  "measureType": "preference",
  "taskType": "multi-select",
  "guidelines": "Select the sentence that conveys the MOST positive sentiment and the LEAST positive sentiment..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/tuple-8" },
  "behavioralData": {
    "features": [
      { "key": "best", "value": "item-a" },
      { "key": "worst", "value": "item-c" }
    ]
  },
  "responseTimeMs": 5100
}

Binary

Participants give a yes/no response. Responses go in categoricalValue.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Grammaticality judgment",
  "measureType": "acceptability",
  "taskType": "binary",
  "labels": ["yes", "no"],
  "guidelines": "Is the following sentence grammatically acceptable?"
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "categoricalValue": "yes",
  "responseTimeMs": 980
}

Free Text

Participants provide open-ended text responses. Responses go in freeText.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Paraphrase generation",
  "measureType": "similarity",
  "taskType": "free-text",
  "guidelines": "Write a paraphrase of the following sentence that preserves its meaning..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "freeText": "The mouse was chased by the cat through the garden",
  "responseTimeMs": 8450
}

Span Labeling

Participants select and optionally label text regions. Responses go in textSpan.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Argument span identification",
  "measureType": "extraction",
  "taskType": "span-labeling",
  "guidelines": "Highlight the text span that answers: who performed the action?"
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "textSpan": { "byteStart": 0, "byteEnd": 7 },
  "responseTimeMs": 3200
}

Magnitude

Participants provide an unbounded numeric value (magnitude estimation). Responses go in scalarValue. Unlike ordinal-scale, there are no scaleMin/scaleMax bounds.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Magnitude estimation of naturalness",
  "measureType": "acceptability",
  "taskType": "magnitude",
  "guidelines": "Assign a number reflecting how natural the sentence sounds. Use any positive number; the reference sentence has a value of 100..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "scalarValue": 230,
  "responseTimeMs": 2100
}

Cloze

Participants fill in blanks in a stimulus. Responses go in freeText.

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Cloze probability estimation",
  "measureType": "comprehension",
  "taskType": "cloze",
  "guidelines": "Complete the sentence by filling in the blank with the first word that comes to mind..."
}

{
  "item": { "recordRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42" },
  "freeText": "dog",
  "responseTimeMs": 1200
}

Stimulus Presentation

The presentation field on experimentDef specifies how individual stimuli are displayed to participants. This is independent of what is being measured, what response is collected, and what instruments record data.

Presentation Methods

Value	Description
`rsvp`	Rapid Serial Visual Presentation: chunks shown one at a time at a fixed rate
`self-paced`	Participant controls advancement (button press to reveal next chunk)
`whole-sentence`	Entire stimulus displayed at once
`auditory`	Spoken stimulus (natural speech, synthesized, or rate-controlled)
`visual-world`	Visual scene displayed alongside auditory stimulus
`masked-priming`	Forward mask, brief prime, then target
`cross-modal`	Stimulus in one modality, probe in another (e.g., auditory sentence + visual word)
`naturalistic`	Extended narrative (audiobook, story) for continuous processing
`gating`	Incremental portions of a spoken word revealed in successive gates
`maze`	Two-alternative forced choice at each word position
`boundary`	Invisible boundary triggers parafoveal preview change during saccade
`moving-window`	Only a window around fixation is visible; rest is masked

All values are community-expandable via methodUri.

Chunking and Timing

For incremental presentation methods (RSVP, self-paced), additional fields control segmentation and timing:

{
  "presentation": {
    "method": "rsvp",
    "chunkingUnit": "word",
    "timingMs": 300,
    "isiMs": 200,
    "cumulative": false
  }
}

{
  "presentation": {
    "method": "self-paced",
    "chunkingUnit": "word",
    "cumulative": false,
    "maskChar": "-"
  }
}

Field	Description
`chunkingUnit`	How text is segmented: `word`, `character`, `morpheme`, `phrase`, `sentence`, `region`, `custom`
`timingMs`	Per-chunk display duration in milliseconds (for timed presentations like RSVP)
`isiMs`	Inter-stimulus interval in milliseconds
`cumulative`	Whether previous chunks remain visible (true for cumulative self-paced reading, false for non-cumulative)
`maskChar`	Masking character replacing hidden text (e.g., `-` for dashes, `#` for hashes)
`features`	Additional method-specific parameters (e.g., prime duration for masked priming, gate size for gating)

Recording Methods

The recordingMethods array on experimentDef declares what instruments capture data. An experiment can use multiple simultaneous recording methods (e.g., EEG + eye-tracking co-registration).

Behavioral Input Devices

Value	Description
`button-box`	Dedicated response box (e.g., Cedrus, PST) with hardware-level timing
`keyboard`	Standard keyboard keypress
`mouse-click`	Mouse button click
`touchscreen`	Touchscreen tap
`voice`	Voice key or microphone onset detection

Physiological Instruments

Value	Description
`eeg`	Electroencephalography (scalp electrodes, ERPs)
`meg`	Magnetoencephalography
`fmri`	Functional magnetic resonance imaging
`fnirs`	Functional near-infrared spectroscopy
`eye-tracking`	Eye tracker (fixation-based reading measures, visual world)
`pupillometry`	Pupil diameter recording
`mouse-tracking`	Continuous mouse cursor trajectory
`emg`	Electromyography
`skin-conductance`	Galvanic skin response
`ecog`	Intracranial EEG / electrocorticography

All values are community-expandable via methodUri. Detailed acquisition parameters (sample rate, channel count, montage) belong on pub.layers.media.media records, not the experiment definition. See the Psycholinguistic Data guide for media record examples.

Examples

EEG study with RSVP and a behavioral task:

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "N400 semantic violation study",
  "measureType": "acceptability",
  "taskType": "binary",
  "presentation": {
    "method": "rsvp",
    "chunkingUnit": "word",
    "timingMs": 300,
    "isiMs": 200,
    "cumulative": false
  },
  "recordingMethods": [
    { "method": "eeg" },
    { "method": "button-box" }
  ],
  "labels": ["acceptable", "unacceptable"],
  "guidelines": "After each sentence, press the left button if acceptable or the right button if unacceptable..."
}

Eye-tracking natural reading with no explicit task:

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Garden-path reading study",
  "measureType": "reading-time",
  "presentation": {
    "method": "whole-sentence"
  },
  "recordingMethods": [
    { "method": "eye-tracking" }
  ],
  "guidelines": "Read each sentence silently at your own pace..."
}

Self-paced reading:

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Garden-path SPR study",
  "measureType": "reading-time",
  "presentation": {
    "method": "self-paced",
    "chunkingUnit": "word",
    "cumulative": false,
    "maskChar": "-"
  },
  "recordingMethods": [
    { "method": "keyboard" }
  ],
  "guidelines": "Press the spacebar to reveal each word..."
}

fMRI with auditory narrative (passive):

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Narrative comprehension fMRI",
  "measureType": "comprehension",
  "presentation": {
    "method": "naturalistic"
  },
  "recordingMethods": [
    { "method": "fmri" }
  ],
  "guidelines": "Listen to the story and try to understand what is happening..."
}

The Stimulus Pipeline

Layers provides a full pipeline from parameterized templates to materialized stimuli.

Template Definition

A template contains text with {slotName} placeholders, plus slot definitions that constrain what can fill each position:

{
  "$type": "pub.layers.resource.template",
  "name": "Transitive sentence template",
  "text": "{subject} {verb} {object} {adjunct}",
  "language": "en",
  "slots": [
    {
      "name": "subject",
      "required": true,
      "collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/animate-nouns",
      "constraints": [{ "expressionFormat": "python-expr", "expression": "self.features.animacy == 'animate'", "scope": "slot" }]
    },
    {
      "name": "verb",
      "required": true,
      "collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/transitive-verbs"
    },
    {
      "name": "object",
      "required": true,
      "collectionRef": "at://did:plc:researcher/pub.layers.resource.collection/nouns"
    },
    {
      "name": "adjunct",
      "required": false,
      "defaultValue": ""
    }
  ],
  "constraints": [
    {
      "expressionFormat": "python-expr",
      "expression": "subject.features.number == verb.features.number",
      "scope": "template"
    }
  ],
  "experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/spr-study"
}

Slot Filling

A filling maps each slot to a specific filler and records the rendered text:

{
  "$type": "pub.layers.resource.filling",
  "templateRef": "at://did:plc:researcher/pub.layers.resource.template/transitive-template",
  "slotFillings": [
    {
      "slotName": "subject",
      "entryRef": "at://did:plc:researcher/pub.layers.resource.entry/the-cat",
      "renderedForm": "The cat"
    },
    {
      "slotName": "verb",
      "entryRef": "at://did:plc:researcher/pub.layers.resource.entry/chased",
      "renderedForm": "chased"
    },
    {
      "slotName": "object",
      "entryRef": "at://did:plc:researcher/pub.layers.resource.entry/the-mouse",
      "renderedForm": "the mouse"
    },
    {
      "slotName": "adjunct",
      "literalValue": "across the garden",
      "renderedForm": "across the garden"
    }
  ],
  "renderedText": "The cat chased the mouse across the garden",
  "strategy": "exhaustive",
  "expressionRef": "at://did:plc:researcher/pub.layers.expression.expression/item-42"
}

Template Composition

Multi-part stimuli (context + target + question) use template compositions:

{
  "$type": "pub.layers.resource.templateComposition",
  "name": "Reading comprehension item",
  "compositionType": "sequence",
  "members": [
    {
      "templateRef": "at://did:plc:researcher/pub.layers.resource.template/context-paragraph",
      "label": "context",
      "ordinal": 0,
      "required": true
    },
    {
      "templateRef": "at://did:plc:researcher/pub.layers.resource.template/target-sentence",
      "label": "target",
      "ordinal": 1,
      "required": true
    },
    {
      "templateRef": "at://did:plc:researcher/pub.layers.resource.template/comprehension-question",
      "label": "question",
      "ordinal": 2,
      "required": true
    }
  ],
  "experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/spr-study"
}

Materialization

The filling's renderedText becomes an expression's text. The expressionRef on the filling links them bidirectionally. Annotations on the materialized expression can mark labeled spans (e.g., the critical region in a self-paced reading study).

Experiment Design

The experimentDef record specifies the full experimental design, including list construction constraints and item ordering.

Full Example

{
  "$type": "pub.layers.judgment.experimentDef",
  "name": "Relative clause attachment ambiguity",
  "measureType": "acceptability",
  "taskType": "ordinal-scale",
  "presentation": {
    "method": "whole-sentence"
  },
  "recordingMethods": [
    { "method": "keyboard" }
  ],
  "scaleMin": 1,
  "scaleMax": 7,
  "guidelines": "Read each sentence and rate how natural it sounds...",
  "ontologyRef": "at://did:plc:researcher/pub.layers.ontology.ontology/rc-attachment-types",
  "templateRefs": [
    "at://did:plc:researcher/pub.layers.resource.template/rc-high-attach",
    "at://did:plc:researcher/pub.layers.resource.template/rc-low-attach"
  ],
  "collectionRefs": [
    "at://did:plc:researcher/pub.layers.resource.collection/fillers"
  ],
  "design": {
    "distributionStrategy": "latin-square",
    "itemOrder": "random-order",
    "timingMs": 30000,
    "listConstraints": [
      {
        "kind": "latin-square",
        "targetProperty": "condition",
        "parameters": {
          "features": [
            { "key": "numConditions", "value": 4 },
            { "key": "numLists", "value": 4 }
          ]
        }
      },
      {
        "kind": "no-adjacent-same-condition",
        "targetProperty": "condition",
        "parameters": {
          "features": [{ "key": "minDistance", "value": 2 }]
        }
      },
      {
        "kind": "balanced-frequency",
        "targetProperty": "itemType",
        "parameters": {
          "features": [
            { "key": "experimental", "value": 24 },
            { "key": "filler", "value": 48 }
          ]
        }
      }
    ]
  }
}

List Constraints

Constraint Kind	Purpose	Parameters
`latin-square`	Each participant sees one condition per item; all conditions equally represented	`numConditions`, `numLists`
`no-adjacent-same-condition`	Prevent consecutive items from the same condition	`minDistance`
`balanced-frequency`	Control the ratio of experimental to filler items	Per-type counts
`min-distance`	Minimum distance between items of the same type	`minDistance`, `targetProperty`

Distribution and Item Order

Field	Known Values	Description
`distributionStrategy`	`latin-square`, `random`, `blocked`, `stratified`, `custom`	How items are assigned to participant lists
`itemOrder`	`random-order`, `fixed-order`, `blocked`, `adaptive`, `custom`	How items are ordered within a list
`timingMs`	integer	Maximum time per item in milliseconds

Behavioral Data

Every judgment can carry responseTimeMs and a behavioralData feature map for rich behavioral signals.

Response Times

The responseTimeMs field captures reaction time in milliseconds. For multi-region tasks (self-paced reading), per-region times go in behavioralData:

{
  "item": { "recordRef": "at://..." },
  "scalarValue": 5,
  "responseTimeMs": 1842,
  "behavioralData": {
    "features": [
      { "key": "region.0.rt", "value": 312 },
      { "key": "region.1.rt", "value": 287 },
      { "key": "region.2.rt", "value": 445 },
      { "key": "region.3.rt", "value": 398 }
    ]
  }
}

Eye-Tracking During Judgment

Eye-tracking data collected during a judgment task can be stored alongside the judgment:

{
  "behavioralData": {
    "features": [
      { "key": "eyetracking.totalFixationTime", "value": 2340 },
      { "key": "eyetracking.numFixations", "value": 8 },
      { "key": "eyetracking.numRegressions", "value": 2 },
      { "key": "eyetracking.firstFixationDuration", "value": 245 },
      { "key": "eyetracking.gazeDataRef", "value": "at://did:plc:researcher/pub.layers.media.media/et-session-017" }
    ]
  }
}

For detailed fixation-level data, see the Psycholinguistic Data guide.

Mouse and Keystroke Tracking

Interaction data captured during web-based experiments:

{
  "behavioralData": {
    "features": [
      { "key": "mouse.numClicks", "value": 1 },
      { "key": "mouse.trajectory", "value": "[[0,400],[50,380],[120,350],[200,320]]" },
      { "key": "mouse.maxDeviation", "value": 45 },
      { "key": "keystroke.numBackspaces", "value": 3 },
      { "key": "keystroke.typingSpeed", "value": 42 }
    ]
  }
}

Agreement Analysis

The agreementReport record summarizes inter-annotator agreement across judgment sets.

Example: Cohen's Kappa

{
  "$type": "pub.layers.judgment.agreementReport",
  "experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/ner-study",
  "judgmentSetRefs": [
    "at://did:plc:annotator1/pub.layers.judgment.judgmentSet/ner-batch1",
    "at://did:plc:annotator2/pub.layers.judgment.judgmentSet/ner-batch1"
  ],
  "metric": "cohens-kappa",
  "value": 8200,
  "numAnnotators": 2,
  "numItems": 500
}

Example: Krippendorff's Alpha

{
  "$type": "pub.layers.judgment.agreementReport",
  "experimentRef": "at://did:plc:researcher/pub.layers.judgment.experimentDef/naturalness-study",
  "judgmentSetRefs": [
    "at://did:plc:annotator1/pub.layers.judgment.judgmentSet/nat-batch1",
    "at://did:plc:annotator2/pub.layers.judgment.judgmentSet/nat-batch1",
    "at://did:plc:annotator3/pub.layers.judgment.judgmentSet/nat-batch1"
  ],
  "metric": "krippendorff-alpha",
  "value": 7100,
  "numAnnotators": 3,
  "numItems": 200
}

Metrics

Metric	Use Case	Scale
`cohens-kappa`	Two annotators, categorical data	0-1000 (maps to 0.0-1.0)
`fleiss-kappa`	Multiple annotators, categorical data	0-1000
`krippendorff-alpha`	Any number of annotators, any scale type	0-1000
`percent-agreement`	Simple agreement percentage	0-1000
`correlation`	Ordinal-scale judgments	0-1000 (maps to 0.0-1.0)
`f1`	Span labeling overlap	0-1000

All metric values use the 0-1000 integer scale for consistent representation without floating-point issues. The metricUri field allows community-defined metrics beyond these known values.

Four Orthogonal Dimensions​

Measure Types​

Task Types (Response Instruments)​

Ordinal Scale​

Categorical​

Forced Choice​

Multi-Select​

Binary​

Free Text​

Span Labeling​

Magnitude​

Cloze​

Stimulus Presentation​

Presentation Methods​

Chunking and Timing​

Recording Methods​

Behavioral Input Devices​

Physiological Instruments​

Examples​

The Stimulus Pipeline​

Template Definition​

Slot Filling​

Template Composition​

Materialization​

Experiment Design​

Full Example​

List Constraints​

Distribution and Item Order​

Behavioral Data​

Response Times​

Eye-Tracking During Judgment​

Mouse and Keystroke Tracking​

Agreement Analysis​

Example: Cohen's Kappa​

Example: Krippendorff's Alpha​

Metrics​

See Also​