Databricks offers a unified platform for data, analytics and AI. Build better AI with a data-centric approach. Simplify ETL, data warehousing, governa
Users generally praise Databricks for its robust data processing capabilities and seamless integration with popular machine learning libraries, making it a popular choice among data scientists and engineers. However, some users note challenges in managing rapidly changing data and express a desire for more targeted resources for machine learning specialists, as most guidance is tailored more to software engineers. Pricing sentiment varies, with some users hinting at a higher cost that could be a barrier for smaller teams. Overall, Databricks maintains a strong reputation for its innovation in AI and data manipulation, though it could improve user support for specific use cases.
Mentions (30d)
1
Reviews
0
Platforms
2
Sentiment
9%
1 positive
Users generally praise Databricks for its robust data processing capabilities and seamless integration with popular machine learning libraries, making it a popular choice among data scientists and engineers. However, some users note challenges in managing rapidly changing data and express a desire for more targeted resources for machine learning specialists, as most guidance is tailored more to software engineers. Pricing sentiment varies, with some users hinting at a higher cost that could be a barrier for smaller teams. Overall, Databricks maintains a strong reputation for its innovation in AI and data manipulation, though it could improve user support for specific use cases.
Features
Use Cases
Industry
information technology & services
Employees
11,000
Funding Stage
Venture (Round not Specified)
Total Funding
$31.9B
Anthropic Announced vs current compute capacity (Sources Below)
source list: Google Cloud TPU deal — up to 1M TPUs, “well over 1 GW” expected online in 2026 https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services https://www.googlecloudpresscorner.com/2025-10-23-Anthropic-to-Expand-Use-of-Google-Cloud-TPUs-and-Services (Anthropic) Fluidstack / Anthropic $50B U.S. AI infrastructure — Texas + New York, sites coming online through 2026 https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure https://www.fluidstack.io/about-us/blog/fluidstack-selected-by-anthropic-to-deliver-custom-data-centers-in-the-us (Anthropic) Microsoft + NVIDIA deal — $30B Azure compute commitment + up to 1 GW additional capacity https://blogs.microsoft.com/blog/2025/11/18/microsoft-nvidia-and-anthropic-announce-strategic-partnerships/ https://blogs.nvidia.com/blog/microsoft-nvidia-anthropic-announce-partnership/ (The Official Microsoft Blog) Google + Broadcom next-gen TPU deal — multiple GW starting 2027; Broadcom SEC filing says ~3.5 GW https://www.anthropic.com/news/google-broadcom-partnership-compute https://investors.broadcom.com/static-files/c906d370-921b-4bc2-bb7b-57877dfcf1ae (Anthropic) Amazon / AWS deal — up to 5 GW, nearly 1 GW by end-2026 https://www.anthropic.com/news/anthropic-amazon-compute (Anthropic) AWS Project Rainier — operational now, nearly half a million Trainium2 chips; Claude expected on 1M+ Trainium2 chips https://www.aboutamazon.com/news/aws/aws-project-rainier-ai-trainium-chips-compute-cluster (Amazon News) SpaceX / Colossus 1 — all Colossus 1 compute, >300 MW, 220k+ NVIDIA GPUs within the month https://www.anthropic.com/news/higher-limits-spacex https://x.ai/news/anthropic-compute-partnership (Anthropic) Independent reporting for SpaceX deal https://www.reuters.com/business/retail-consumer/anthropic-unveils-dreaming-feature-help-its-ai-agents-self-improve-2026-05-06/ (Reuters) submitted by /u/Business_Garden_7771 [link] [comments]
View originalTips for BI analysis with Claude? My results so far are shockingly bad compared to general coding
I have a lot of hands-on experience with developing R pipelines to ingest large, live, very dirty datasets and produce relatively straightforward BI-type analyses. Trends, completion rates, revenue etc. I am currently working on a project with a small, live, moderately dirty dataset. The output should be simple analyses eg of lead quality, time to deal, revenue per product line. I am developing this project with Python and DuckDB. I am having incredible difficulty with getting Claude (Code) to coherently do this work, even when taking the pipeline design process step by step. I am always using Opus 4.7 High, and regularly experiencing Claude contradict clear instructions I gave it even within the last 5 minutes. It gives extremely generic names to variables and then very soon will completely misunderstand what the variables mean. It leaps to fixing problems without having any understanding of them and invents generic terminology that disagrees with the established project terms. My hypothesis is that this is an artifact of the data exploration. Inevitably as I explore the dirty data while building this pipeline I'm constantly uncovering new edge cases that need to be accounted for, and I guess this likely pollutes the context very quickly. Likely also Claude is more hesitant to codify "findings" than would be normal in a data pipeline, because it's engineered for more... deterministic (?) programming situations where findings are often meant to be fixed and forgotten. I am planning a few changes to my normal workflow: Much smaller context window, potentially even clearing after every small adjustment to the pipeline Strictly aligning with enterprise-grade standards (eg OpenTelemetry, Databricks Medallions) even for this small project Developing an extremely strict and exhaustively clear variable naming structure so that as Claude writes the tokens for each variable it cannot avoid understanding its meaning (eg medallion___source_module___data_scope___data_qualifiers___stat_type___time_window). Enforce constant linting of 2 and 3 through a hook. Anything else that can be recommended? One thing I'm attempting to do is "go with the flow" and try to figure out what Claude "wants" to do, then strictly codify that... but it seems like most often Claude is just doing random things. Any advice for that? submitted by /u/unwritten734 [link] [comments]
View originalDoes Claude create graphic reports from spreadsheet data?
I am often times trying to pull data from spreadsheets and making charts and graphs to better represent the data for others to understand. Does Claude handle this well? I used Databricks for this and it was awesome. I would love to use Claude for this so my monthly cost could be used for other AI purposes than just graph data. I have been on the fence about Claude and Claude code, but I have heard they have removed some of the agentic features. submitted by /u/nikkonine [link] [comments]
View originalHow I Used Claude Code to build an AI Jobs Globe in One Day
Everyone wants to get into AI but nobody knows where the jobs actually are. So I mapped every AI job I could find onto a 3D globe for it. A3D interactive globe that maps 15,352 AI job openings across 1,144 companies in 41 countries, all posted after February 2026. Here's how I do it with Claude Cowork and Claude Code: Part 1 — Claude Cowork (research + data pipeline) Step 1: Ideation + a master list of 1,802 companies Started with a vague hunch: "everyone knows AI jobs are exploding, nobody knows HOW exploding." Cowork helped me brainstorm into a concrete product, then we curated 1,802 AI companies across 3 reputation tiers (top brands like Google/Amazon, strong companies like Palantir/Databricks, emerging startups), categorized by country, industry, and tier. Step 2: Scraped 15,352 AI jobs + geocoded 4,682 offices Cowork wrote scripts using python-jobspy to pull listings from Indeed and LinkedIn for all 1,802 companies, handled batch runs, rate limiting, and dedup. For Chinese companies where Western boards don't work, it manually researched 122 entries. Filtered out internships and classified jobs into 4 AI types (technical / upskill / executive / AI-native). Then converted every "Mountain View, CA" string to lat/lon via Nominatim with caching + retry — 4,682 locations geocoded at 100% success. Step 3: 1,594 company logos + 3-doc PRD with a "SIGINT terminal" design system Cowork tried multiple logo sources (an open-source library at 16% match → Google Favicons API + DuckDuckGo fallback + manual domain lookups for the obvious ones), ending at 1,594 PNGs. Then wrote a full PRD split into frontend.md / logic.md / data.md covering UI, API, database. I uploaded a screenshot of an app called WORLDVIEW; Cowork created a "SIGINT Terminal" design system — monospace fonts, CRT scanlines, no rounded corners, government-monitoring-screen aesthetic. Step 4: Supabase + GitHub setup, hand off to Claude Code Cowork generated SQL schema + Python import script, set up a Supabase project, ran the migration, and imported all 3 tables (companies / offices / jobs) + 2 views — zero errors across 21K+ rows. Used Desktop Commander (an MCP that controls your local terminal) to run gh repo create, copy 1,594 logos in, commit, push. Handed the 3 PRD files to a fresh Claude Code session. Part 2 — Claude Code (build + iterate + deploy) Step 5: Stack pivot before writing a single line The PRD said Three.js + NASA night-textures, but the visual reference was Bilawal Sidhu's WORLDVIEW. Claude Code researched his actual stack and pushed back: Three.js can't reach that quality — Bilawal uses CesiumJS + Google Photorealistic 3D Tiles (the photogrammetric 3D Earth product Bilawal himself helped build at Google Maps). I approved the pivot. The PRD got rewritten on the fly. Step 6: Wired env + Vercel + GCP, built the whole frontend Created .env, linked a Vercel project via the CLI, added all keys (Supabase + Google Maps API) to all 3 environments, enabled the Map Tiles API on GCP. Then built the entire app in vanilla JS + Vite + Cesium: photoreal 3D globe, 4,682 office spikes as glowing polylines (with city-clustering to fix the "50 SF companies stacking in 1 pixel" problem), full SIGINT chrome — topbar / TARGETS rail / detail panel / stats bar / scope mode. No framework. I never opened a code editor. Step 7: Tight iteration + deploy loop Every commit auto-deployed to Vercel in ~30 seconds. I dropped screenshots of whatever was wrong → Claude Code diagnosed, fixed, pushed, deployed, I tested live, repeated. Wired Vercel Web Analytics on both URLs at the end. What's noteworthy about this workflow Every commit auto-deployed. Screenshot → diagnosis → fix → push → live URL → next screenshot. Tight visual feedback loop, no manual deploys. Background agents ran while I worked in the foreground. The two building-research agents wrote JSON I ingested without breaking flow. When one hit a monthly token cap mid-run, I just re-ran it the next day and merged the output. Visual feedback via screenshots was the entire QA loop. The polyline alone went through 6 width/glow tunings (4 → 7 → 12 → 16 → 8 → 12 px) and a full 3D-cylinder experiment + revert, all driven by me dropping screenshots and Claude reading them. I never wrote code. I'm a CPO, not an engineer. Cesium scene, Supabase queries, Vite config, scope-mode state machine, panel race-guard, pitch deck — all Claude Code. I was the design/PM brain pointing at "this looks wrong, fix that." Three SOT documents kept everything coherent. The PRD drifted hard from the original plan (Three.js → Cesium pivot, scope mode invented mid-build, six pill swaps) but Claude Code maintained dated Recent-Changes logs in all three SOT files. At any point I could read frontend.md and the deployed site matched Try it here: https://ai-jobs-globe.vercel.app/ submitted by /u/Similar-Kangaroo-223 [link] [comments]
View originalPattern detection agents
Has anyone worked on creating a pattern detection agent of sorts ? i am working on Databricks and there is a lot of data that we work with, often times the data keeps changing weekly i want to create an agent that detects any patterns (example CPT shifting, DRG shifting, etc) in the PHARMA data. the constraint is i don't know what patterns will emerge each week in the new data so there is no start point. how do we create such an agent? submitted by /u/Such_Rush_6956 [link] [comments]
View originalI built a Claude Code toolkit for ML on Databricks, because all the tips out there are for software engineers, not ML engineers/ML data scientists.
Hey everyone, I've been using Claude Code for ML work on Databricks for a few months now and wanted to share something I put together that might help others in the same boat. What I kept running into If you've looked into Claude Code tips and best practices online, you'll notice almost all of them are geared toward software development: edit code, run tests, ship it. And that's great, but the ML workflow on Databricks is just... different. Your code doesn't run locally. Your laptop is CPU-only but your real training happens on a GPU cluster. You can't just run your script and see if it works, you have to get your code onto the cluster, submit a job, wait, then go fish out the metrics from MLflow. And if you've dealt with DBR 15+ quirks (Workspace path errors, wheel installation changes, stale pydantic caching), you know how much time you can lose on stuff that has nothing to do with your actual model. The thing that bugged me most was that Claude would help me write great training code, and then I'd spend the next 15 minutes manually uploading, submitting, checking results, and copying metrics back so Claude knew what happened. It felt like I was the middleware. What I ended up building Over time I built up a set of Claude Code skills and agents that automate this loop. I finally cleaned them up and put them in a repo in case they're useful to anyone else: github.com/duonginspace/claude-code-databricks-ml The highlights: /run-on-databricks: builds your project as a wheel, uploads to DBFS, submits the job, waits, and pulls MLflow metrics back. One slash command instead of 5 manual steps. /iterate: you say "try adding label smoothing" and Claude implements it, submits to Databricks, pulls results, compares with previous runs, and suggests what to try next. /compare-runs: ranks your experiments, shows what helped and what hurt. /init-databricks-ml: this is the one I wish I had when I started. It scaffolds a complete project with submit/pull scripts, Makefile, MCP config, and all the DBR 15+ workarounds already baked in. /explore-data, /research-papers, /train-local: for the rest of the workflow (EDA, literature search, quick local smoke tests before burning GPU time) There are also 3 agents that the skills delegate to (experiment runner, data analyst, research agent), a /commit command, and a status bar script that shows your context window usage, git branch, and rate limits. What it actually gives you Claude can finally close the loop. It doesn't just write your code and hand it back to you, it submits, tracks, and learns from results. You go from "copilot" to something closer to a junior researcher who can run experiments on their own. You skip the Databricks onboarding tax. The DBR 15+ gotchas alone (DBFS vs Workspace paths, runtime wheel installation, stale module caching, MLflow experiment naming) cost me days to figure out. /init-databricks-ml handles all of it from day one. Faster iteration cycles. Instead of context-switching between your editor, the Databricks UI, and MLflow every time you want to try something, you stay in the terminal. Say "try X" and come back to a comparison table. Your experiments stay organized. Every run gets logged to MLflow automatically, and /compare-runs gives you a ranked summary instead of you eyeballing dashboards. It's easier to spot what's actually working. Less wasted GPU time. /train-local lets you smoke-test on CPU before burning cluster hours, and the skills are structured to catch obvious issues early. It's modular. You don't have to use everything. Install just the one skill you need, or the whole toolkit. They work independently. Install git clone https://github.com/duonginspace/claude-code-databricks-ml.git cd claude-code-databricks-ml bash setup.sh Copies everything to ~/.claude/. MIT licensed. This is very much shaped by my own workflow, so it won't be perfect for everyone. But if you're doing ML on Databricks with Claude Code, or thinking about trying it, I hope it gives you a head start. Would love to hear how others are handling this, and happy to answer any questions. submitted by /u/duongnguyen0512 [link] [comments]
View originalDatabricks uses a tiered pricing model. Visit their website for current pricing details.
Key features include: Unified, Scalable, Lakehouse, Delta Lake, Machine learning, Don’t miss out, last chance to save 50%, The Databricks Platform, Modern applications need a lakebase.
Databricks is commonly used for: Ready to start?.
Databricks integrates with: Apache Spark, MLflow, Tableau, Power BI, Snowflake, AWS S3, Azure Data Lake Storage, Google Cloud Storage, Kafka, Datadog.
Based on 11 social mentions analyzed, 9% of sentiment is positive, 91% neutral, and 0% negative.
Ion Stoica
Co-founder at Anyscale / Databricks
3 mentions

Strategic App Expansion and the Power of Proprietary Data | Ali Ghodsi at HumanX
Apr 10, 2026