12 alternatives to ExLlamaV2 in the infrastructure category
ExLlamaV2
A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp-org/exllamav2
Looking for alternatives?
Compare 12 similar tools below
Train, deploy, observe, and evaluate LLMs from a single platform. Lower cost, faster latency, and dedicated support from Inference.net.
Welcome to Cloudflare - Powering the next generation of applications
Bring your own code, and run CPU, GPU, and data-intensive compute at scale. The serverless platform for AI and data teams.
High-throughput and memory-efficient inference and serving engine for Large Language Models. Deploy AI faster with state-of-the-art performance.
Create with AI or code, deploy instantly on production infrastructure. One platform to build and ship.
Cloud GPUs, on-demand clusters, private cloud, and hardware for AI training and inference. Run B200 and H100, deploy fast, and scale cost effectively.
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.