Unbabel

ai-translationtranslationtiered

Unbabel’s Language Operations Platform gives businesses the ability to thrive across cultures and geographies by eliminating language barriers.

Users generally praise Unbabel for its seamless integration and effectiveness at combining AI with human review to enhance translation quality. However, some complaints focus on occasional inaccuracies and a lack of support for certain niche languages. Pricing opinions are mixed; some see it as fair given the service quality, while others find it somewhat high. Overall, Unbabel holds a solid reputation for its innovative approach to translation tasks, although it may not be the perfect fit for all language needs.

Website

Mentions (30d)

Reviews

Platforms

Sentiment

0 positive

15 integrations3 featuresMerger / Acquisition

Latest Videos

Unbabel | AI Translations Your Business Can Trust

Sep 13, 2023

LangOps Universe 2022

Oct 3, 2022

Share:Twitter LinkedIn

Product Screenshots

AI Summary

Features & Use Cases

Features

Customer portalEditor interfaceWe take security seriously

Use Cases

Customer support in multiple languagesLocalizing marketing content for global audiencesTranslating product documentation and manualsFacilitating multilingual communication in remote teamsEnhancing user experience on international websitesImproving accessibility for non-English speaking customersSupporting global sales teams with translated materialsTranslating social media content for broader reach

Company Intel

Industry

information technology & services

Employees

690

Funding Stage

Merger / Acquisition

Total Funding

$111.7M

Mentions by Platform

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

Pricing

tiered

Platform Distribution

Sentiment Overview

Positive0% (0)

Neutral100% (6)

Negative0% (0)

Recent Mentions

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

youtube

Unbabel AI

View original

reddit@[unknown]4/14/2026

We benchmarked TranslateGemma against 5 other LLMs on subtitle translation across 6 languages. At first glance the numbers told a clean story, but then human QA added a chapter. [D]

We evaluated six models on English subtitle translation into Spanish, Japanese, Korean, Thai, Chinese Simplified, and Chinese Traditional - 167 segments per language pair, scored with two reference-free QE metrics. Models tested: TranslateGemma-12b claude-sonnet-4-6 deepseek-v3.2 gemini-3.1-flash-lite-preview gpt-5.4-mini gpt-5.4-nano Scoring We used MetricX-24 (lower = better) and COMETKiwi (higher = better) - both reference-free QE metrics. We also developed a combined score: TQI = COMETKiwi × exp(−MetricX / 10) The exponential decay term converts MetricX into a multiplicative fidelity penalty. When MetricX is near 0, TQI ≈ COMETKiwi. As MetricX grows, the penalty increases exponentially. TQI is our own metric, not an industry standard. Top-level results (avg TQI across all 6 languages) Rank Model Avg TQI #1 TranslateGemma-12b 0.6335 #2 gemini-3.1-flash-lite-preview 0.5981 #3 deepseek-v3.2 0.5946 #4 claude-sonnet-4-6 0.5811 #5 gpt-5.4-mini 0.5785 #6 gpt-5.4-nano 0.5562 All models sit between 0.75-0.79 on COMETKiwi (fluency). Models diverge significantly on MetricX-24 fidelity scores - that's where the TQI separation comes from. A few things worth discussing: 1. Metric-model affinity concern One caveat worth noting: MetricX-24 is a Google metric and TranslateGemma is a Google model. COMETKiwi - from Unbabel - shows a noticeably smaller gap between TranslateGemma and the field. The direction of the result holds either way, but the size of the lead may be partially inflated by metric-model affinity. 2. Claude collapses in Japanese claude-sonnet-4-6 ranked last (#6) in Japanese - MetricX 3.90, its worst result across all languages. Its COMETKiwi (0.79) was decent. Classic fluency-fidelity mismatch: output that sounds natural but drifts from source meaning. 3. Gemini Flash Lite outperforms full-sized frontier models A "lite" model consistently ranked #2-3, beating Claude Sonnet and both GPT-5.4 variants across most languages. 4. TranslateGemma ranked #1 - then human QA found something the metrics had missed entirely TranslateGemma topped every language. When our linguists reviewed the Traditional Chinese (zh-TW) output, the model was outputting Simplified Chinese for both zh-CN and zh-TW language codes. We then investigated community reports suggesting zh-Hant as the correct explicit tag for Traditional Chinese and retested with it. Result: 76% of segments still came back Simplified, 14% Traditional, 10% ambiguous (segments too short or script-neutral to classify). https://preview.redd.it/h6gfrd0ew4vg1.jpg?width=773&format=pjpg&auto=webp&s=fbe0afae3831528440b956167456e94004bcbe09 MetricX-24 and COMETKiwi scored both outputs identically and highly - no indication of a problem from either metric. As it turns out, this is a confirmed, publicly documented issue caused by training data bias: TranslateGemma's fine-tuning corpus is heavily skewed toward Simplified Chinese. The locale tags are accepted without error but not honored by the model's weights. This affects all model sizes (4B, 12B, 27B) - upgrading to a larger model size won't fix it, since the root cause is training data composition, not capacity. A workaround exists (OpenCC s2twp post-processing), but standard QE metrics will look fine the whole time - that's exactly the problem for any pipeline relying on automated validation. submitted by /u/ritis88 [link] [comments]

View original

Integrations

ZendeskSalesforceSlackShopifyWordPressHubSpotMicrosoft TeamsJiraIntercomGoogle WorkspaceTrelloAsanaMailchimpZapierAPI for custom integrations

Unbabel

Compare Unbabel With