Been experimenting with Claude 3.5 Sonnet for automated invoice processing and getting solid results overall, but hitting some interesting edge cases.
Setup: PDF → text extraction via PyMuPDF → structured JSON via Claude API. Processing ~200 invoices/day from various suppliers.
Accuracy is impressive for standard fields (vendor, total, date) — hitting about 94% on my test set. But running into issues with:
Anyone found good prompting strategies for these? Currently using a two-step approach: first pass for basic extraction, second pass for validation/corrections. Considering fine-tuning but the dataset prep seems brutal.
Also curious about cost optimization — running about $0.15 per invoice with current token usage. Worth exploring Claude Haiku for simpler invoices?
Code snippet of my validation prompt:
validation_prompt = f"Review this extracted data against the original invoice. Flag any inconsistencies in totals, dates, or vendor info: {extracted_data}"
Any tips appreciated, especially for handling those gnarly European invoices with 15 different tax rates.
Those API costs add up fast at scale. We switched to a hybrid approach - Claude for complex layouts, local regex/rules for standard invoice formats. Went from $180/month to $45 processing similar volume. Consider batching requests and caching vendor-specific templates. Have you calculated your per-invoice processing cost including the PyMuPDF overhead?
Hold up - are you sanitizing inputs before sending to Claude? Invoice PDFs are prime vectors for malicious content injection. Also consider data residency requirements - some regions prohibit sending financial docs to third-party APIs. We implemented local OCR + on-premise models after our compliance team flagged similar setup. The accuracy hit was worth avoiding potential regulatory issues. What's your data retention policy with Anthropic?
Nice results! I'm seeing similar accuracy with Claude 3.5 on financial docs. Pro tip: preprocess with table detection (like unstructured.io) before Claude - bumped our multi-column invoice accuracy from 91% to 96%. Also try few-shot examples in your prompt with your most common edge cases. What's your token usage looking like at 200/day? I'm averaging ~1200 tokens per invoice.
Your post cut off mid-sentence - what specific edge cases are you hitting? Line item extraction issues? Multi-page invoices? Foreign currency formatting? I'm evaluating Claude for similar use case and curious about the failure modes you're seeing. Also, how are you handling invoice tables that span multiple columns?
Respectfully, I'd question the PDF→text→LLM pipeline. You're losing crucial spatial information that invoices rely on. We get better results with vision models (GPT-4V or Claude 3.5 with images) directly on invoice images. Yes, higher token costs, but fewer extraction errors from misaligned text. The 94% accuracy might improve significantly with visual context preserved.