Retail Provider Reference¶

What is this document? A complete reference of every retail provider in the Tuvima Library pipeline — what they accept as search parameters, what they return, and how their output feeds into Stage 2 (Wikidata) resolution. This is the authoritative source for provider capabilities.

How Providers Fit Into the Pipeline¶

The hydration pipeline runs in two stages:

Stage 1 (Retail Identification): Retail providers search for the media item using file metadata. They return cover art, descriptions, ratings, and — critically — bridge IDs (ISBN, TMDB ID, etc.) that uniquely identify the item on external platforms.
Stage 2 (Wikidata Bridge): The Wikidata Reconciliation adapter uses those bridge IDs to resolve the item's Wikidata QID. Each bridge ID maps to a Wikidata property code (e.g. ISBN-13 maps to P212). If bridge resolution fails, Stage 2 falls back to title+author text search.

Retail providers are a rich data source for matching — descriptions, narrator data, ratings, and cover art similarity are all used to rank candidates against file metadata. Wikidata is the authority for final canonical values (title, author, year, genre, series).

Provider Summary¶

Provider	Media Types	Auth Required	Rate Limit	Language Strategy	Status
Apple API	Books, Audiobooks	None	500ms throttle	Localized (user language)	Active
TMDB	Movies, TV	Bearer token	250ms, max 2 concurrent	Localized (user language)	Active (requires key)
MusicBrainz	Music	None	1100ms, max 1 concurrent	Source (English only)	Active
Metron	Comics	Basic auth	500ms, max 1 concurrent	Source (English only)	Active (requires key)
Podcast Index	Podcasts	X-Auth-Key header	300ms	Source (English only)	Active (requires key)
Open Library	Books	None	500ms	Source (English only)	Disabled (config kept)
Google Books	Books	API key (query param)	200ms	Localized	Removed

Query Parameters — What We Send¶

Most providers only accept Title as a free-text search parameter. Author, Director, Narrator, and Artist are not sent to the provider API — they are used by the RetailMatchScoringService to rank results after they come back.

Per-Provider Search Strategies¶

Apple API¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
ISBN Lookup	0 (highest)	`isbn`	`/lookup?isbn={isbn}`	Books
Apple ID Lookup	1	`apple_books_id`	`/lookup?id={apple_books_id}&country={country}&lang={lang}_{country}`	Books, Audiobooks
Ebook Search	2	`title`	`/search?term={title}&entity=ebook&limit={limit}&country={country}&lang={lang}_{country}`	Books
Audiobook Search	2	`title`	`/search?term={title}&entity=audiobook&limit={limit}&country={country}&lang={lang}_{country}`	Audiobooks

Notes: ISBN lookup is exact match (1 result). Title search returns up to 25 results, fetches top 5. Author is not a query parameter — used for post-search ranking only. Cover art URL requires regex transform to remove Apple's size constraints for full resolution.

TMDB¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
Movie Search	1	`title`	`/search/movie?query={title}&year={year}&include_adult=false&language={lang}-{country}&page=1`	Movies
TV Search	1	`title`	`/search/tv?query={title}&first_air_date_year={year}&include_adult=false&language={lang}-{country}&page=1`	TV

Notes: Year is optional but included when available from file metadata. Director is not a query parameter — used for post-search ranking only. Returns poster and backdrop image paths (prepend https://image.tmdb.org/t/p/w500 or w1280).

MusicBrainz¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
Release Search	1	`title`	`/release?query={title}&fmt=json&limit={limit}`	Music
Recording Search	2	`title`	`/recording?query={title}&fmt=json&limit={limit}`	Music

Notes: Title only. Artist, Album, and Year are not query parameters — used for post-search ranking. Cover art from Cover Art Archive (https://coverartarchive.org/release/{id}/front-250). Strict rate limit (1100ms between requests, max 1 concurrent) per MusicBrainz policy.

Metron¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
ISBN Lookup	0 (highest)	`isbn`	`/issue/?isbn={isbn}`	Comics
Issue Search	1	`title`	`/issue/?series_name={title}&limit={limit}`	Comics
Series Search	2	`title`	`/series/?name={title}&limit={limit}`	Comics

Notes: ISBN lookup is exact match. Title is used as series_name parameter for issue search, name for series search. Author and Year are not query parameters. Returns rich comic metadata including publisher, credits, issue number, and series position.

Podcast Index¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
Podcast Search	1	`title`	`/search/byterm?q={title}&max=5`	Podcasts
Episode Search	2	`title`	`/search/byterm?q={title}&type=episode&max=5`	Podcasts

Notes: Title only. Year is not a query parameter. Returns feed URL and podcast GUID as bridge identifiers. Runs in hydration Stage 2 (not Stage 1) — needs Wikidata QIDs first.

Open Library (Disabled)¶

Strategy	Priority	Required Fields	URL Pattern	Media Types
ISBN Search	1	`isbn`	`/search.json?isbn={isbn}&limit=3&fields=...`	Books
Title Search	2	`title`	`/search.json?q={title} {author}&limit={limit}&fields=...`	Books

Notes: The only provider whose title search also sends {author} in the query (combined as {title} {author}). ISBN search returns up to 3 results. Returns first_sentence as description (lower quality than dedicated description APIs). 14-day cache TTL. Currently disabled but config preserved for future use.

Response Fields — What We Extract¶

Each provider's config defines a field_mappings array that maps JSON response paths to claim keys. These become ProviderClaim objects (key, value, confidence) stored in metadata_claims.

Extraction Summary¶

Provider	Title	Author	Year	Cover Art	Description	Rating	Publisher	Series	Genre	Language
Apple API	--	--	--	0.85	0.85	0.70 (Books)	--	--	--	--
TMDB	0.85	--	0.90	0.90 (poster+backdrop)	0.85	0.80	--	--	0.85	0.85
MusicBrainz	0.80	0.80 (artist-credit)	0.85	0.70	--	--	--	--	--	--
Metron	0.85	0.80 (credits)	0.85	0.85	0.80	--	0.85	0.90	--	--
Podcast Index	0.80	0.75	--	0.75	0.80	--	--	--	--	0.80
Open Library	0.75	0.80	0.85	0.70	0.60	--	0.70	--	0.65	0.70

Numbers represent confidence values assigned to extracted claims. -- means the provider does not return that field.

Value Transforms¶

The ValueTransformRegistry applies transformations during extraction:

Transform	Purpose	Used By
`regex_replace`	Strip image size suffixes from cover URLs	Apple API
`strip_html`	Clean HTML tags from descriptions	Apple API, Metron, Open Library, Podcast Index
`to_string`	Convert numeric values (IDs, ratings) to strings	Apple API, TMDB, Metron, Podcast Index
`url_template`	Construct full image URLs from partial paths	TMDB, MusicBrainz, Open Library
`first_n_chars(4)`	Extract 4-digit year from date strings	TMDB, MusicBrainz, Metron
`prefer_isbn13`	Select ISBN-13 from array of ISBN formats	Open Library
`array_join`	Join array elements into comma-separated string	TMDB (genres), MusicBrainz (artists), Metron (credits), Open Library (genres)

Bridge IDs — What Feeds Stage 2¶

Bridge IDs are external platform identifiers that the Wikidata Reconciliation adapter uses to resolve the item's Wikidata QID. Each bridge ID maps to a Wikidata property code.

Bridge IDs by Provider¶

Provider	Bridge ID	Claim Key	Confidence	Wikidata Property
Apple API	Apple Books ID	`apple_books_id`	0.95	P6395
TMDB	TMDB ID	`tmdb_id`	1.0	P4947 (movies) / P4983 (TV)
MusicBrainz	MusicBrainz Release ID	`musicbrainz_id`	1.0	P436
MusicBrainz	ISRC	`isrc`	0.9	P1243
Metron	Comic Vine ID	`comic_vine_id`	0.95	P5905
Metron	GCD ID	`gcd_id`	0.95	P11957
Metron	ISBN	`isbn`	0.95	P212
Podcast Index	Feed URL	`feed_url`	1.0	(no P-code; text fallback)
Podcast Index	Podcast GUID	`podcast_guid`	1.0	P10553
Open Library	ISBN	`isbn`	0.90	P212 (ISBN-13) / P957 (ISBN-10)

Stage 2 Resolution Flow Per Bridge ID¶

Direct lookup: The Reconciliation adapter searches Wikidata using haswbstatement:{P-code}={value} (e.g. haswbstatement:P212=978-0441013593).
Edition awareness: For edition-aware media types (Books, Audiobooks, Movies, Comics, Music), the adapter checks P629 (edition_or_translation_of) on the resolved entity. If present, the entity is an edition — the adapter walks up to the work QID for hub grouping and keeps the edition QID separately.
TMDB media-type switch: tmdb_id uses P4947 for movies and P4983 for TV series — the property code is selected based on the effective media type after Stage 1.
ISBN edition filtering: When resolving ISBNs via P747 (has_edition_or_translation), audiobooks get audiobook-class editions only (Q122731938, Q106833962), books get non-audiobook editions. Other media types get all editions unfiltered.
Fallback: If no bridge ID resolves, Stage 2 falls back to FetchWorkAsync — title+author text search via wbsearchentities.

Preferred Bridge ID Order (per media type)¶

The hydration config specifies which bridge IDs to try first per media type:

Media Type	Preferred Order
Books	`isbn`
Audiobooks	`apple_books_id`, `isbn`, `asin`
Movies	`tmdb_id`, `imdb_id`
TV	`tmdb_id`, `imdb_id`
Music	`musicbrainz_id`, `isrc`
Comics	`comic_vine_id`, `gcd_id`, `isbn`
Podcasts	`podcast_guid`, `feed_url`

Retail Match Scoring¶

After a provider returns results, the RetailMatchScoringService scores each candidate against file metadata:

Scoring Weights¶

Field	Weight	How Scored
Title	0.45	Fuzzy string similarity (Levenshtein distance)
Author/Creator	0.35	Fuzzy string similarity
Year	0.10	Exact = 1.0, off-by-1 = 0.8, off-by-2+ = 0.3
Format	0.10	Exact media type match

Boosters¶

Booster	Condition	Boost
Cross-field	Narrator found in description, genre overlap	Variable
Cover art (strong)	pHash similarity > 0.8	+0.10
Cover art (moderate)	pHash similarity > 0.6	+0.05

Thresholds¶

Threshold	Value	Result
Auto-accept	>= 0.85	Item organized automatically, claims deposited
Ambiguous	0.50 - 0.85	Review queue (`RetailMatchAmbiguous`)
Failed	< 0.50 or no results	Review queue (`RetailMatchFailed`)

Cover Art Matching¶

Embedded cover art from the file is perceptually hashed (aHash — 64-bit fingerprint via 8x8 grayscale resize). When retail provider cover art is downloaded, it's hashed and compared. The Hamming distance produces a similarity score (0.0-1.0) that feeds into the scoring composite as a confidence boost.

Provider Configuration¶

All provider configs live in config/providers/ as individual JSON files. Each is self-contained with connection details, search strategies, response mappings, and bridge ID preferences. Adding a new REST+JSON provider is a zero-code operation: create a config file, restart the Engine.

Config File Structure¶

{
  "id": "UUID",
  "name": "provider_name",
  "display_name": "Human Name",
  "enabled": true,
  "version": "1.0",
  "domain": "media_domain",
  "media_types": ["Type1", "Type2"],
  "entity_types": ["Work", "MediaAsset"],
  "base_url": "https://api.example.com",
  "weight": 0.8,
  "language_strategy": "localized|source|both",
  "auth": { ... },
  "rate_limit": { "throttle_ms": 500, "max_concurrent": 2 },
  "search_strategies": [ ... ],
  "field_mappings": [ ... ],
  "preferred_bridge_ids": { "MediaType": ["id1", "id2"] }
}

See config/providers/ for complete examples of each provider.