File Processing | Sprout Documentation

Supported File Types

Category	Extensions	Max Size	Processing
Documents	PDF, DOC, DOCX, TXT, RTF	50 MB	Text extraction + OCR
Spreadsheets	XLS, XLSX, CSV	25 MB	Tabular data parsing
Presentations	PPT, PPTX	25 MB	Slide text extraction
Images	JPG, PNG, GIF, WEBP, TIFF	10 MB	OCR + visual analysis
Code	JS, TS, PY, GO, JAVA, etc.	5 MB	Syntax-aware parsing
Data	JSON, XML, YAML, MD	5 MB	Structured parsing

Extraction Pipeline

Upload

File uploaded to Vercel Blob Storage (temporary, 24h TTL)

Detection

File type detection via magic numbers

Extraction

Content extracted using appropriate processor

Chunking

RecursiveCharacterTextSplitter (2000 tokens, 200 overlap)

Embedding

Vector embedding via sentence-transformers/all-MiniLM-L6-v2

Content Extractors

// Content extraction by file type
PDF      → pdf-parse + Tesseract OCR (scanned)
Office   → mammoth (DOCX) / xlsx-populate
Images   → Huggingface Vision API
           └─ Model: microsoft/trocr-large-printed
Code     → tree-sitter for syntax trees
General  → raw text extraction

OCR Capabilities

Engine

Tesseract.js + Huggingface Vision

Languages

100+ languages supported

Printed Text

95%+ accuracy

Handwriting

85%+ accuracy

Special handling: Table structure recognition, handwriting detection, receipt/invoice parsing, ID document extraction.

Multi-modal Processing

When processing images, Sprout combines multiple analysis techniques:

// Image + Text combined analysis
1. Extract text via OCR
2. Generate caption via Salesforce/blip-image-captioning-base
3. Detect objects via facebook/detr-resnet-50
4. Combine context for AI model
5. Response includes visual and textual insights

← AI Model Stack Authentication →