Together Inference vs ExLlamaV2 — Features, Pricing & Reviews Compared

Together Inference

infrastructure

ExLlamaV2

infrastructure

Pain: 7/10013 integrations8 featuresSeries B

Pain: 1/10015 integrations10 featuresOther

The Bottom Line

Together Inference excels in providing a full-stack AI platform with its various features like GPU clusters and batch inference API, supported by significant funding of $533.5M in Series B. Meanwhile, ExLlamaV2, with $7.9B in other funding, focuses on local infrastructure for running large language models efficiently on consumer GPUs and offers extensive integration capabilities. Both tools cater to advanced AI application development but serve distinct community preferences and infrastructure needs.

Best for

Together Inference is the better choice when your team requires scalable AI solutions with real-time inference capabilities and extensive model support, particularly for data-driven decision support systems and personalized recommendation engines.

Best for

ExLlamaV2 is the better choice when the focus is on running large language models locally without reliance on cloud services, ideal for research and experimentation, or when seeking efficient integration with existing machine learning workflows.

Key Differences

1.Together Inference offers real-time natural language processing capabilities while ExLlamaV2 excels in local running of LLMs on consumer-grade GPUs.
2.Together Inference integrates with large-scale cloud platforms such as AWS and Google Cloud, whereas ExLlamaV2 provides integration with developer-friendly tools like Streamlit and FastAPI.
3.Together Inference has a smaller company size (~210 employees) and focuses on rapid scalability, in contrast to ExLlamaV2's larger company size (~6200 employees) which supports broader community and integration resources.
4.Pricing for Together Inference includes a subscription with tiered pricing starting at $0.06, whereas ExLlamaV2 operates on a tiered pricing model, with concerns noted around usage-based models possibly impacting costs.
5.Together Inference is backed by $533.5M in Series B funding, demonstrating strong investment confidence, compared to ExLlamaV2's $7.9B which supports substantial development and integration resources.

Verdict

Together Inference is suited for companies seeking a robust, scalable AI platform with cloud integration for real-time applications. It offers a cost-effective solution with its open-source base and tiered subscription model. ExLlamaV2 is ideal for teams emphasizing local infrastructure and integration for machine learning workflows, with its strong financial backing and extensive employee base providing key community and support advantages. Choose based on the need for cloud reliance or local deployment capabilities.

Overview

What each tool does and who it's for

Together Inference

Build what's next on the AI Native Cloud. Full-stack AI platform for inference, fine-tuning, and GPU clusters — powered by cutting-edge research.

Together Inference has been praised for its performance improvements and adaptability, specifically with its Aurora model, which offers faster decoding and continuously enhances itself over time. Users appreciate the open-source nature and contributions welcomed from the community, as well as expanding model support and improved efficiency. However, there are concerns about static draft models becoming less efficient with shifting traffic patterns, requiring frequent updates. Pricing sentiment isn't explicitly indicated, but the open-source aspect suggests positive reception in terms of cost-effectiveness. Overall, Together Inference holds a solid reputation for innovation and performance, especially in AI and coding spaces.

ExLlamaV2

A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp-org/exllamav2

While "ExLlamaV2" is not explicitly mentioned in the provided social mentions and reviews, the context around software development and tools highlights the strengths of integration with platforms like GitHub Copilot for efficient coding and workflow enhancements. Users generally appreciate tools that streamline processes and incorporate advanced features for complex tasks. The evolving nature of billing models, like the move to usage-based pricing for GitHub Copilot, indicates mixed feelings about pricing, with some users potentially wary of increased costs. Overall, software tools that improve developer productivity and offer seamless integration tend to have a positive reputation, though concerns around pricing changes can impact user sentiment.

Key Metrics

Mentions (30d)

Mention Velocity

How discussion volume is trending week-over-week

Together Inference

-50% vs last week

ExLlamaV2

-86% vs last week

Where People Discuss

Mention distribution across platforms

Together Inference

Twitter/X

66%

28%

YouTube

ExLlamaV2

Twitter/X

95%

YouTube

Community Sentiment

How developers feel about each tool based on mentions and reviews

Together Inference

3% positive96% neutral1% negative

ExLlamaV2

6% positive94% neutral0% negative

Pricing

Together Inference

subscription + tieredFree tier

Pricing found: $1.40, $4.40, $0.30, $0.06, $1.20

ExLlamaV2

tiered

Use Cases

When to use each tool

Together Inference (6)

Real-time natural language processingLarge-scale machine learning model deploymentInteractive AI applicationsData-driven decision support systemsAutomated content generationPersonalized recommendation systems

ExLlamaV2 (8)

Running large language models locally on consumer-grade hardwareIntegrating with existing machine learning workflows for inference tasksDeveloping and testing AI applications without relying on cloud servicesCreating custom AI solutions for specific business needsOptimizing model performance with dynamic batching and cachingConducting research and experimentation with LLMs in a controlled environmentBuilding prototypes for AI-driven applicationsFacilitating educational projects and learning about AI model deployment

Features

Only in Together Inference (8)

FlashAttention-4 for faster inferenceATLAS runtime-learning acceleratorsSelf-service NVIDIA GPU clustersBatch Inference APISupport for multiple model typesScalable architecture for large workloadsReal-time inference capabilitiesUser-friendly dashboard for monitoring

Only in ExLlamaV2 (10)

New generator with dynamic batching, smart prompt caching, K/V cache deduplication and simplified APIUh oh!Method 1: Install from sourceMethod 2: Install from release (with prebuilt extension)Method 3: Install from PyPIConversionEvaluationCommunityHuggingFace reposResources

Integrations

Only in Together Inference (13)

NVIDIA CUDATensorFlowPyTorchKubernetesDockerApache KafkaAWSGoogle Cloud PlatformMicrosoft AzureSlack for notificationsJupyter Notebooks for developmentGrafana for monitoringPrometheus for metrics collection

Only in ExLlamaV2 (15)

TabbyAPI for OpenAI-compatible API accessHugging Face Transformers for model compatibilityDocker for containerized deploymentsTensorFlow for additional model supportPyTorch for deep learning framework integrationFastAPI for building web applicationsFlask for lightweight web servicesStreamlit for creating interactive applicationsKubernetes for orchestration of deploymentsJupyter Notebooks for interactive developmentVS Code for integrated development environment supportGitHub Actions for CI/CD workflowsSlack for team notifications and updatesZapier for automation and integration with other appsRedis for caching and performance optimization

Developer Ecosystem

—

HuggingFace Models

Pain Points

Top complaints from reviews and social mentions

Together Inference

API costs (1)

ExLlamaV2

down (7)breaking (1)

Top Discussion Keywords

Most mentioned keywords from community discussions

Together Inference

API costs (1)

ExLlamaV2

down (7)breaking (1)

Product Screenshots

Together Inference

ExLlamaV2

What People Talk About

Most discussed topics from community mentions

Together Inference

model selection14

open source9

agents8

scalability8

accuracy7

performance7

RAG6

deployment5

ExLlamaV2

open source21

agents12

model selection10

performance5

security5

workflow5

streaming3

scalability2

Top Community Mentions

Highest-engagement mentions from the community

Together Inference

Introducing Mamba-3 🐍 Inference speeds are more i

Introducing Mamba-3 🐍 Inference speeds are more important than ever, driven by the rise in agents and inference-heavy RL rollouts. Linear models are

Twitter/Xby @togethercomputeneutral source

ExLlamaV2

Cooking up something new 🧑‍🍳 Join the waitlist for early access to technical preview of the GitHub Copilot app 👇 https://t.co/ODODKdvzOA https://t.co/1h7AJPAhiH

Twitter/Xby @github source

Company Intel

information technology & services

Industry

information technology & services

210

Employees

6,200

$533.5M

Funding

$7.9B

Series B

Stage

Other

Supported Languages & Categories

Shared (3)

AI/MLDevOpsDeveloper Tools

Only in ExLlamaV2 (2)

FinTechSecurity

Frequently Asked Questions

Is Together Inference or ExLlamaV2 better for real-time processing tasks?▼

Together Inference is better for real-time processing tasks due to its real-time inference capabilities and cloud-integrated infrastructure.

How does Together Inference pricing compare to ExLlamaV2?▼

Together Inference uses a subscription-based model with tiered options starting at $0.06, making it potentially more predictable, whereas ExLlamaV2 uses a tiered pricing model with concerns over usage-based pricing impacts.

Which has better community support, Together Inference or ExLlamaV2?▼

ExLlamaV2 likely has better community support due to its larger company size (~6200 employees) and broader integration capabilities with developer tools.

Can Together Inference and ExLlamaV2 be used together?▼

While there is no direct integration mentioned, developers can potentially use each tool's strengths in complementary capacities within a broader AI infrastructure strategy.

Which is easier to get started with, Together Inference or ExLlamaV2?▼

ExLlamaV2 may be easier to get started with for developers interested in local infrastructure and seamless integration with developer tools such as FastAPI and Streamlit.

View Together Inference Profile View ExLlamaV2 Profile