Skip to main content
Version: v0.11.2

Content Signals

The @peac/mappings-content-signals package parses and records content use policy signals from three sources. It follows an observation-only model: signals are recorded as evidence, never enforced.

Observation-only

Content signal observation records what a publisher declared at the time of access. PEAC never enforces these signals -- enforcement is the responsibility of the consuming application.


Install

Terminal
pnpm add @peac/mappings-content-signals

Signal sources

SourceSpecificationSyntax
robots.txtRFC 9309 (Robots Exclusion Protocol)User-agent / Disallow directives
Content-Usage headerIETF AIPREF draft (draft-ietf-aipref-attach-00), Structured Fields per RFC 9651HTTP Structured Fields
Content-Signal headercontentsignals.org specificationHTTP header
tdmrep.jsonW3C Community Group Final Report (TDM Reservation Protocol), EU DSM Directive 2019/790 Article 4JSON file

Source precedence

When multiple sources provide conflicting signals, the following precedence applies (highest to lowest):

  1. tdmrep.json (most specific, publisher-authored)
  2. Content-Signal header (per contentsignals.org)
  3. Content-Usage header (IETF AIPREF)
  4. robots.txt (least specific, broadest scope)

Three-state observation model

Every signal resolves to one of three states:

StateMeaning
allowPublisher explicitly permits the specified purpose
denyPublisher explicitly denies the specified purpose
unspecifiedNo signal found for this purpose

The unspecified state is not a default allow or deny -- it means no signal was observed. Applications decide how to handle unspecified signals.


Usage

observe-signals.ts
import {
parseRobotsTxt,
parseContentUsage,
parseContentSignal,
parseTdmRep,
resolveSignals,
} from '@peac/mappings-content-signals';

// Parse individual sources (all receive pre-fetched input, no network I/O)
const robots = parseRobotsTxt(robotsTxtContent, 'PEACBot');
const aipref = parseContentUsage(contentUsageHeader);
const contentSignal = parseContentSignal(contentSignalHeader);
const tdmrep = parseTdmRep(tdmrepJsonContent);

// Resolve with precedence
const observation = resolveSignals({
robotsTxt: robots,
contentUsage: aipref,
contentSignal: contentSignal,
tdmRep: tdmrep,
});

// Result: { train: 'deny', search: 'allow', inference: 'unspecified', ... }
No network I/O

All parsers receive pre-fetched input as strings or objects. The package performs no network requests, no DNS lookups, and no file system access. Fetching the source content is the caller's responsibility.


Canonical purposes

Signals map to PEAC's five canonical purposes:

PurposeDescription
trainUse content for AI model training
searchIndex and display in search results
user_actionDisplay in response to direct user request
inferenceUse as context for AI inference
indexCrawl and index content metadata

EU TDM compliance

The tdmrep.json source relates to the EU DSM Directive 2019/790 Article 4, which establishes text and data mining (TDM) rights and reservations. The W3C Community Group Final Report defines the machine-readable format for declaring these reservations.

PEAC's content signal observation records whether a TDM reservation was declared. It does not interpret or enforce the legal implications of the reservation. Legal compliance is the responsibility of the consuming application.


Next steps