platform

yusiboyz/platform

Fork 0

Commit Graph

Author	SHA1	Message	Date
Yusuf Suleman	b179386a57	feat: brain PDF/image text extraction — pymupdf + tesseract OCR + vision API - PDF: extracts selectable text via pymupdf, falls back to Tesseract OCR for scanned docs - PDF: renders first page as screenshot thumbnail - Images: Tesseract OCR for text extraction, OpenAI vision API fallback for photos - Plain text files: direct decode - All extracted text stored in extracted_text field for search/embedding - Tested: PDF upload → text extracted → AI classified → searchable New deps: pymupdf, pytesseract, Pillow System dep: tesseract-ocr added to both Dockerfiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 18:49:04 -05:00
Yusuf Suleman	8275f3a71b	feat: brain service — self-contained second brain knowledge manager Full backend service with: - FastAPI REST API with CRUD, search, reprocess endpoints - PostgreSQL + pgvector for items and semantic search - Redis + RQ for background job processing - Meilisearch for fast keyword/filter search - Browserless/Chrome for JS rendering and screenshots - OpenAI structured output for AI classification - Local file storage with S3-ready abstraction - Gateway auth via X-Gateway-User-Id header - Own docker-compose stack (6 containers) Classification: fixed folders (Home/Family/Work/Travel/Knowledge/Faith/Projects) and fixed tags (28 predefined). AI assigns exactly 1 folder, 2-3 tags, title, summary, and confidence score per item. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 11:48:29 -05:00

Author

SHA1

Message

Date

Yusuf Suleman

b179386a57

feat: brain PDF/image text extraction — pymupdf + tesseract OCR + vision API

- PDF: extracts selectable text via pymupdf, falls back to Tesseract OCR for scanned docs
- PDF: renders first page as screenshot thumbnail
- Images: Tesseract OCR for text extraction, OpenAI vision API fallback for photos
- Plain text files: direct decode
- All extracted text stored in extracted_text field for search/embedding
- Tested: PDF upload → text extracted → AI classified → searchable

New deps: pymupdf, pytesseract, Pillow
System dep: tesseract-ocr added to both Dockerfiles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-01 18:49:04 -05:00

Yusuf Suleman

8275f3a71b

feat: brain service — self-contained second brain knowledge manager

Full backend service with:
- FastAPI REST API with CRUD, search, reprocess endpoints
- PostgreSQL + pgvector for items and semantic search
- Redis + RQ for background job processing
- Meilisearch for fast keyword/filter search
- Browserless/Chrome for JS rendering and screenshots
- OpenAI structured output for AI classification
- Local file storage with S3-ready abstraction
- Gateway auth via X-Gateway-User-Id header
- Own docker-compose stack (6 containers)

Classification: fixed folders (Home/Family/Work/Travel/Knowledge/Faith/Projects)
and fixed tags (28 predefined). AI assigns exactly 1 folder, 2-3 tags, title,
summary, and confidence score per item.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-01 11:48:29 -05:00

2 Commits