feat: thumbnail extraction for Reader — fixes all clients
All checks were successful
Security Checks / dockerfile-lint (push) Successful in 4s
Security Checks / dependency-audit (push) Successful in 13s
Security Checks / secret-scanning (push) Successful in 3s

Server-side (dashboard + iOS + any client):
- Added thumbnail column to reader_entries
- Worker extracts from media:thumbnail, media:content, enclosures, HTML img
- API returns thumbnail in EntryOut with & decoding
- Backfilled 260 existing entries

iOS:
- Prefers API thumbnail, falls back to client-side extraction
- Decodes HTML entities in URLs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Yusuf Suleman
2026-04-03 19:32:47 -05:00
parent 798ba17a93
commit a3eabf3e3b
4 changed files with 59 additions and 1 deletions

View File

@@ -40,6 +40,7 @@ class EntryOut(BaseModel):
status: str = "unread"
starred: bool = False
reading_time: int = 1
thumbnail: str | None = None
feed: FeedRef | None = None
class Config:
@@ -49,6 +50,10 @@ class EntryOut(BaseModel):
def from_entry(cls, entry: Entry) -> "EntryOut":
# Use full_content if available, otherwise RSS content
best_content = entry.full_content if entry.full_content else entry.content
# Extract thumbnail from stored field, or from content
thumb = entry.thumbnail
if not thumb:
thumb = cls._extract_thumbnail(entry.content or entry.full_content or "")
return cls(
id=entry.id,
title=entry.title,
@@ -60,9 +65,25 @@ class EntryOut(BaseModel):
status=entry.status,
starred=entry.starred,
reading_time=entry.reading_time,
thumbnail=thumb,
feed=FeedRef(id=entry.feed.id, title=entry.feed.title) if entry.feed else None,
)
@staticmethod
def _extract_thumbnail(html: str) -> str | None:
"""Extract first image URL from HTML content."""
if not html:
return None
import re
match = re.search(r'<img[^>]+src=["\']([^"\']+)["\']', html[:3000], re.IGNORECASE)
if match:
url = match.group(1).replace("&amp;", "&")
# Skip tiny tracking pixels and icons
if any(skip in url.lower() for skip in ["1x1", "pixel", "tracking", "spacer"]):
return None
return url
return None
class EntryListOut(BaseModel):
total: int