Self-hosted vs API LLMs: Crunching the Numbers on Total Cost of Ownership

GGina R.·2d ago

cost-optimizationllm-providersarchitecture

Hey folks!

I've been knee-deep in evaluating whether to stick with OpenAI's API or pivot towards hosting a model like GPT-J (or even GPT-NeoX). The decision seems to hinge on more than just server costs.

For context, we're running a text generation service with roughly 500k queries per month. Currently, we shell out about $5,000/mo using GPT-4 via API. I've read some promising posts about folks self-hosting similar models on GPUs like the A100s, but I'm unsure of the hidden costs—it can't just be the AWS bills!

Anyone who did a full TCO analysis willing to share their insights? My list of considerations include:

GPU/CPU cost for hosting (initial and ongoing)
Maintenance headaches (think: DevOps hours?)
Update cycles and keeping models fresh
Outages and how they impact service
API advantages (like latest model access and no maintenance)

Would love to hear how others handle this calculus, especially if you've flipped from API to self-hosting!

7 Comments

NNora V·2d ago

I'm curious if anyone's quantified the cost of downtime or service outages when self-hosting. With APIs, the SLA often guarantees a certain level of uptime, which can be worth the premium. What have folks experienced in terms of outage frequency and recovery time, especially working with models like GPT-NeoX?

SSam D.·2d ago

In case you're looking for alternatives, we found Google's TPU pricing to be competitive for our workload compared to AWS. Also, using preemptible instances can significantly bring down costs if your application can tolerate some downtime. It's a trade-off but worth considering!

PPrince H·1d ago

It really comes down to your specific use case. We moved to self-hosting GPT-NeoX a few months ago, and while the initial GPU costs were hefty, we're now spending around $3,500/month on infrastructure compared to $6,000 when we used an API. The hidden costs are indeed in DevOps and model maintenance. We had a 2-day outage once after a bad update went live; the trade-off is having full control and customization.

TTom G·1d ago

We made the switch from an API to self-hosting GPT models a few months ago. The biggest surprise was how much time our DevOps team now spends on system upkeep — we're talking easily 20-30 extra hours a month. Sure, there's a savings on API costs, but it’s offset by the need for skilled personnel who can manage and troubleshoot these systems effectively. If you're considering it, definitely weigh the human resource factor heavily!

LLi S.·1d ago

Has anyone benchmarked the exact number of queries per second they're getting on a self-hosted setup vs. the API? Curious about throughput differences, especially around peak usage times. Also, for those who've chosen self-hosting, how do you manage security concerns, especially regarding data privacy on rented hardware?

LLeo T·17h ago

Have you factored in energy costs for on-premise hosting? We noticed a 20% overhead in power costs when we ran a similar setup in our own data center. Also, how do you plan to handle model updates? Keeping a model like GPT-NeoX optimal requires continuous fine-tuning, which can be another expense if you don't have the right experts.

LLiam D.·7h ago

We've been self-hosting GPT-J on A100s for a while now, and I'd say maintenance is definitely the hidden cost people don't initially consider. We're spending around 40% of our DevOps time just on model tuning and scaling optimizations alone. Not to mention, every time a new model comes out, evaluating and integrating it is a whole project in itself. That said, owning our stack means we can fine-tune and pass model improvements onto clients faster.