Skip to content

Database Schema Reference

SQLite database located at .data/database/library.db (path set in config/core.json).

Latest migration: M-068. All migrations are idempotent (IF NOT EXISTS guards). Schema changes are applied automatically on Engine startup.

Conventions: - UUIDs stored as TEXT - Booleans stored as INTEGER (0 = false, 1 = true) - Timestamps stored as TEXT in ISO-8601 format (2026-03-29T14:00:00Z) - Foreign keys enabled via PRAGMA foreign_keys = ON


Core Media

hubs

Represents a Series (user-facing) or Universe (ParentHub). Both are stored in the same table, distinguished by hub_type.

Column Type Notes
id TEXT UUID, primary key
hub_type TEXT "Hub" = Series, "ParentHub" = Universe
title TEXT Display title
wikidata_qid TEXT Wikidata entity identifier. Indexed.
parent_hub_id TEXT FK → hubs.id. NULL for top-level Universes and standalone Series.
media_type TEXT Primary media type for this hub
description TEXT Wikipedia or provider description
sort_title TEXT Normalized title for alphabetical sorting
created_at TEXT Timestamp
updated_at TEXT Timestamp
last_enriched_at TEXT Timestamp of last Wikidata enrichment

Indices: wikidata_qid, parent_hub_id, hub_type

works

A single title, independent of version or format.

Column Type Notes
id TEXT UUID, primary key
hub_id TEXT FK → hubs.id. The Series this work belongs to.
wikidata_qid TEXT Wikidata entity identifier. Indexed.
title TEXT Canonical display title
original_title TEXT Title in the work's source language (Phase 2 localization)
media_type TEXT Books, Audiobooks, Movies, TV, Music, Comics, Podcasts
sort_title TEXT Normalized for sorting
status TEXT Verified, Provisional, NeedsReview, Quarantined, Pending
created_at TEXT Timestamp
updated_at TEXT Timestamp
last_enriched_at TEXT Timestamp

Indices: wikidata_qid, hub_id, media_type, status

editions

A specific version of a Work (e.g., "4K HDR Blu-ray Remux", "First Edition Hardcover").

Column Type Notes
id TEXT UUID, primary key
work_id TEXT FK → works.id
title TEXT Edition label
created_at TEXT Timestamp

media_assets

A single file on disk.

Column Type Notes
id TEXT UUID, primary key
edition_id TEXT FK → editions.id
file_path TEXT Absolute path on disk
file_name TEXT Filename only
fingerprint TEXT SHA-256 hash. Used for deduplication and move detection.
file_size_bytes INTEGER
media_type TEXT Resolved media type
container TEXT File container format (e.g., mkv, epub, m4b)
duration_sec INTEGER For audio/video assets
ingested_at TEXT Timestamp
status TEXT Current pipeline status
staging_path TEXT Path within .data/staging/ while in-flight

Indices: fingerprint, edition_id, status

hub_items

Links works to hubs (Series to Universe relationships).

Column Type Notes
hub_id TEXT FK → hubs.id
work_id TEXT FK → works.id
sort_order INTEGER Position within the hub

Many-to-many cross-links between works and hubs for works that span multiple Series.

Column Type Notes
hub_id TEXT FK → hubs.id
work_id TEXT FK → works.id
link_type TEXT Relationship type (e.g., "adaptation", "companion")

Metadata

metadata_claims

Append-only log of every metadata value ever received for an entity, with its source and confidence.

Column Type Notes
id TEXT UUID, primary key
entity_id TEXT The work, edition, hub, or person this claim belongs to
entity_type TEXT "Work", "Hub", "Person", etc.
field_key TEXT Field identifier (e.g., "title", "author", "genre"). See MetadataFieldConstants.cs.
value TEXT Claim value
source_id TEXT Provider GUID
source_name TEXT Human-readable provider name
confidence REAL Score 0.0–1.0
source_language TEXT BCP-47 language code of this claim's value (Phase 6)
is_user_lock INTEGER 1 if user explicitly set this value (Tier A — always wins)
created_at TEXT Timestamp
decayed_at TEXT Timestamp when stale decay was applied. NULL if fresh.

Indices: entity_id + field_key, source_id, is_user_lock

canonical_values

The winning single-valued claim for each field after Priority Cascade resolution.

Column Type Notes
entity_id TEXT
entity_type TEXT
field_key TEXT
value TEXT Winning value
source_id TEXT Which provider won
confidence REAL Winning confidence score
resolved_at TEXT Timestamp of last resolution

Unique index: entity_id + field_key

canonical_value_arrays

The winning multi-valued claims (genres, authors, vibe tags, cast members).

Column Type Notes
entity_id TEXT
entity_type TEXT
field_key TEXT
values TEXT JSON array of winning values
source_id TEXT
resolved_at TEXT Timestamp

Unique index: entity_id + field_key

search_index

FTS5 full-text search index with trigram tokenizer (migration M-062). Handles CJK and languages without word boundaries. Short queries under 3 characters fall back to a LIKE scan.

Column Type Notes
entity_id TEXT UNINDEXED — used for JOIN back to source tables
title TEXT Primary display title
original_title TEXT Source-language title
alternate_titles TEXT Wikidata aliases and romanizations (e.g., "Sen to Chihiro no Kamikakushi")
author TEXT Author/creator names
description TEXT Full description text

Persons

persons

Wikidata-sourced person records. Always enriched from Wikidata; never manually created.

Column Type Notes
id TEXT UUID, primary key
wikidata_qid TEXT Wikidata identifier. Indexed.
name TEXT Display name
description TEXT Wikipedia short description
birth_date TEXT ISO-8601 date
death_date TEXT ISO-8601 date. NULL if living.
nationality TEXT
headshot_url TEXT Cached headshot image URL
last_enriched_at TEXT Timestamp. Records enriched within 30 days skip re-fetch.
last_revision_id TEXT Wikidata revision ID. Used for freshness checks before full property re-fetch.

Index: wikidata_qid

person_roles

Roles a person plays in the library (author, director, narrator, composer, etc.).

Column Type Notes
person_id TEXT FK → persons.id
role TEXT Wikidata property code (e.g., P50 = author, P57 = director, P161 = cast member)
work_id TEXT FK → works.id

Aggregated library presence — which media types a person appears in and how many works.

Column Type Notes
person_id TEXT FK → persons.id
media_type TEXT
work_count INTEGER

person_aliases

Pseudonyms and alternate names, including Wikidata-resolved aliases.

Column Type Notes
person_id TEXT FK → persons.id
alias TEXT Alternate name
alias_type TEXT "pseudonym", "birth_name", "wikidata_alias"
merge_target_id TEXT If this alias resolves to a different person record, the canonical person's id

pending_person_signals

Person names from file metadata or AI extraction awaiting standalone Wikidata reconciliation.

Column Type Notes
id TEXT UUID, primary key
entity_id TEXT The work or edition this name came from
name TEXT Person name to reconcile
signal_source TEXT "file_metadata" (confidence 0.80) or "ai_extracted" (confidence 0.75)
created_at TEXT Timestamp
resolved_at TEXT NULL until reconciliation completes

Links fictional characters to the real-world performers who portray them.

Column Type Notes
character_id TEXT FK → fictional_entities.id
person_id TEXT FK → persons.id
work_id TEXT FK → works.id. Scopes the link to a specific adaptation.
era_qualifier TEXT Temporal qualifier (e.g., "young", "2024 series") for era-correct actor matching

Universe Graph

fictional_entities

Characters, locations, factions, and other fictional elements within Universes.

Column Type Notes
id TEXT UUID, primary key
universe_qid TEXT The Universe (ParentHub) this entity belongs to
wikidata_qid TEXT Wikidata QID if the entity is notable enough to have one
entity_type TEXT "Character", "Location", "Faction", "Concept"
name TEXT Display name
description TEXT
p31_type TEXT Wikidata P31 (instance of) value. Used to detect animated character art.

Which works a fictional entity appears in.

Column Type Notes
entity_id TEXT FK → fictional_entities.id
work_id TEXT FK → works.id
appearance_type TEXT "primary", "supporting", "mention"

entity_relationships

Directed relationships between fictional entities (e.g., "Frodo" → parent_of → "Sam").

Column Type Notes
source_entity_id TEXT FK → fictional_entities.id
target_entity_id TEXT FK → fictional_entities.id
relationship_type TEXT Relationship label
temporal_qualifier TEXT Era or time scope for this relationship
lore_delta_type TEXT Canon discrepancy flag (e.g., "adaptation_change")

narrative_roots

Universe-level root records with Wikidata provenance.

Column Type Notes
id TEXT UUID, primary key
hub_id TEXT FK → hubs.id (the ParentHub)
wikidata_qid TEXT
franchise_qid TEXT Wikidata P8345 franchise identifier
series_qid TEXT Wikidata P179 series identifier

qid_labels

Cached Wikidata display labels for QIDs, avoiding repeated API lookups.

Column Type Notes
qid TEXT Primary key
label TEXT Display label in the configured metadata language
fetched_at TEXT Timestamp

Providers

provider_registry

Registered providers and their runtime state.

Column Type Notes
id TEXT GUID — matches provider_id in config files. Primary key.
name TEXT Display name
media_types TEXT JSON array of served media types
stage TEXT "Stage1" or "Stage2"
enabled INTEGER
last_checked_at TEXT Timestamp of last health check
health_status TEXT "Healthy", "Degraded", "Unavailable"

provider_config

Live provider configuration values (mirrors config/providers/*.json after load).

Column Type Notes
provider_id TEXT FK → provider_registry.id
config_key TEXT Configuration field name
config_value TEXT Configuration field value

provider_response_cache

Cached provider API responses to eliminate redundant calls.

Column Type Notes
id TEXT UUID, primary key
provider_id TEXT
cache_key TEXT Hashed request parameters
response_body TEXT Cached JSON response
cached_at TEXT Timestamp
expires_at TEXT TTL-based expiry timestamp

provider_health

Historical health check log per provider.

Column Type Notes
provider_id TEXT
checked_at TEXT Timestamp
status TEXT
latency_ms INTEGER

Security

api_keys

API keys for Engine authentication.

Column Type Notes
id TEXT UUID, primary key
key_hash TEXT SHA-256 hash of the key. The plaintext key is never stored after creation.
label TEXT Human-readable label
role TEXT "Administrator", "Curator", "Consumer"
created_at TEXT Timestamp
last_used_at TEXT Timestamp. NULL if never used.
revoked_at TEXT NULL if active. Timestamp if revoked.

profiles

User profiles for multi-user support.

Column Type Notes
id TEXT UUID, primary key
name TEXT Display name
pin_hash TEXT PIN hash. NULL if no PIN set.
role TEXT
created_at TEXT Timestamp

Operations

ingestion_log

One row per file processed through the ingestion pipeline.

Column Type Notes
id TEXT UUID, primary key
batch_id TEXT FK → ingestion_batches.id
asset_id TEXT FK → media_assets.id
file_path TEXT Original file path
fingerprint TEXT SHA-256 hash
outcome TEXT "Promoted", "Rejected", "NeedsReview", "Duplicate", "Failed"
outcome_reason TEXT Natural-language explanation for non-promoted outcomes
pipeline_state TEXT JSON snapshot of pipeline stage completion
created_at TEXT Timestamp

ingestion_batches

Groups ingestion_log rows into named runs.

Column Type Notes
id TEXT UUID, primary key
started_at TEXT Timestamp
completed_at TEXT NULL until batch finishes
file_count INTEGER Total files in batch
promoted_count INTEGER
rejected_count INTEGER

deferred_enrichment_queue

Works queued for background enrichment (Pass 2).

Column Type Notes
id TEXT UUID, primary key
entity_id TEXT
entity_type TEXT
priority INTEGER Lower = higher priority
queued_at TEXT Timestamp
started_at TEXT NULL until processing begins

review_queue

Items the Engine could not confidently match and that require human review.

Column Type Notes
id TEXT UUID, primary key
entity_id TEXT
entity_type TEXT
review_reason TEXT Why this item is in the queue
candidates TEXT JSON array of ranked candidates
created_at TEXT Timestamp
resolved_at TEXT NULL until resolved or dismissed
resolution_type TEXT "Selected", "Provisional", "Dismissed", "SkippedUniverse"

resolver_cache

Cached candidate lists from previous search/resolve operations.

Column Type Notes
cache_key TEXT Hashed query parameters. Primary key.
candidates TEXT JSON array
cached_at TEXT Timestamp

Content

image_cache

Tracks all images stored in .data/images/.

Column Type Notes
id TEXT UUID, primary key
entity_id TEXT
entity_type TEXT
image_type TEXT "CoverArt", "Headshot", "Banner", "Logo", "Backdrop"
file_path TEXT Absolute path on disk
source_url TEXT Original provider URL
provider_id TEXT
user_override INTEGER 1 if uploaded by user. Protected from orphan sweep.
phash TEXT Perceptual hash for visual similarity matching
created_at TEXT Timestamp
is_preferred INTEGER 1 if this is the selected image for display

search_results_cache

Cached results from Intent Search queries.

Column Type Notes
cache_key TEXT Hashed query. Primary key.
results TEXT JSON array of entity IDs
cached_at TEXT Timestamp

ui_settings_cache

Server-side cache of resolved UI settings per profile and device class.

Column Type Notes
profile_id TEXT
device_class TEXT
settings_json TEXT Resolved merged settings
computed_at TEXT Timestamp

bridge_ids

External provider identifiers (ISBN, ASIN, TMDB ID, MBID) used for cross-provider linking.

Column Type Notes
entity_id TEXT
entity_type TEXT
id_type TEXT "ISBN", "ASIN", "TMDB", "MBID", etc.
value TEXT The identifier value
source_id TEXT Which provider supplied this ID

Index: entity_id + id_type, value + id_type (for reverse lookup)


Reader

reader_bookmarks

Saved reading positions.

Column Type Notes
id TEXT UUID, primary key
asset_id TEXT FK → media_assets.id
profile_id TEXT FK → profiles.id
chapter_index INTEGER
cfi TEXT EPUB Canonical Fragment Identifier for precise position
created_at TEXT Timestamp

reader_highlights

Text highlights and annotations.

Column Type Notes
id TEXT UUID, primary key
asset_id TEXT
profile_id TEXT
chapter_index INTEGER
cfi_range TEXT CFI range for highlighted text
text TEXT Highlighted text content
note TEXT User annotation. NULL if no note.
color TEXT Highlight color
created_at TEXT Timestamp

reader_statistics

Aggregated reading session data.

Column Type Notes
asset_id TEXT
profile_id TEXT
total_seconds INTEGER Total reading time
completion_pct REAL 0.0–1.0
last_position_cfi TEXT Last known reading position
session_count INTEGER Number of reading sessions
last_opened_at TEXT Timestamp

alignment_jobs

Whispersync audio-to-text alignment jobs.

Column Type Notes
id TEXT UUID, primary key
asset_id TEXT
status TEXT "Pending", "Running", "Completed", "Failed"
alignment_data TEXT JSON alignment map
created_at TEXT Timestamp

User

user_states

Per-profile, per-asset playback and reading state.

Column Type Notes
profile_id TEXT
asset_id TEXT
state_type TEXT "Reading", "Watching", "Listening"
position TEXT Progress position (CFI for EPUB, seconds for audio/video)
updated_at TEXT Timestamp

user_taste_profiles

AI-generated per-user taste vectors, updated by the Taste Profiling feature.

Column Type Notes
profile_id TEXT Primary key. FK → profiles.id.
genre_weights TEXT JSON map of genre → weight
vibe_weights TEXT JSON map of vibe tag → weight
creator_weights TEXT JSON map of person QID → weight
computed_at TEXT Timestamp

Activity

system_activity

Rolling log of Engine actions. Pruned automatically per config/maintenance.json.

Column Type Notes
id TEXT UUID, primary key
run_id TEXT Groups related activities into a single run
activity_type TEXT Type code (e.g., "Ingestion", "Enrichment", "Writeback")
entity_id TEXT Related entity. NULL for system-level activities.
outcome TEXT "Success", "Failure", "Skipped"
message TEXT Human-readable description
created_at TEXT Timestamp

Index: run_id, activity_type, created_at

transaction_log

Ordered log of all database writes, used for debugging and rollback analysis.

Column Type Notes
id INTEGER Auto-increment primary key
table_name TEXT Which table was modified
row_id TEXT The affected row's primary key
operation TEXT "INSERT", "UPDATE", "DELETE"
old_values TEXT JSON snapshot before change. NULL for INSERT.
new_values TEXT JSON snapshot after change. NULL for DELETE.
created_at TEXT Timestamp