Two meaningful changes since v2.1.0. tldl now detects new podcast episodes directly from RSS feeds with conditional GETs instead of relying on Podcast Index re-crawls, so episodes typically land in the queue within minutes of publication. A second fix catches episodes that get retitled or have their GUIDs regenerated after publication — a surprisingly common pattern in the wild.
What’s new
- RSS-first monitoring. The monitor now fetches RSS feeds directly with
If-Modified-Since/If-None-Matchheaders, queues full episode metadata without a Podcast Index round-trip, and falls back to PI only on RSS errors. Detection lag drops from “hours” (PI re-crawl cadence) to “minutes” for feeds that update frequently. POST /admin/rebuild-index. Backfill endpoint now populatesaudioUrlon every existing index entry so the new dedup check works retroactively.
Fixes
- Silent duplicate episodes. Episodes that publishers edited after publishing — new title, regenerated GUID, or both — used to slip past dedup and get transcribed twice. A new audio-URL dedup signal (origin + pathname, query-stripped, lowercased) catches them. Confirmed against a real-world retitle where Lenny’s Podcast re-published an episode with a different title + GUID, and 100 historical near-duplicates silently deduped on the first force-check after deploy.
Under the hood
- Queue messages carry full episode metadata when the source is RSS, so the consumer branches on
rssSourcedand skips Podcast Index + iTunes enrichment entirely on that path. - New
audioUrlfield onEpisodeIndexEntry. - Monitor cron cadence tuned to every 2 hours — RSS conditional GETs keep the feed-scan cost low, and most monitored feeds don’t publish often enough to justify the previous 30-minute cadence.