Databricks Review — Features, Pricing & User Sentiment | Payloop

Databricks

ai-analyticslakehousetiered

Databricks offers a unified platform for data, analytics and AI. Build better AI with a data-centric approach. Simplify ETL, data warehousing, governa

Users generally praise Databricks for its robust data processing capabilities and seamless integration with popular machine learning libraries, making it a popular choice among data scientists and engineers. However, some users note challenges in managing rapidly changing data and express a desire for more targeted resources for machine learning specialists, as most guidance is tailored more to software engineers. Pricing sentiment varies, with some users hinting at a higher cost that could be a barrier for smaller teams. Overall, Databricks maintains a strong reputation for its innovation in AI and data manipulation, though it could improve user support for specific use cases.

Mentions (30d)

1

Reviews

0

Platforms

2

Sentiment

6%

1 positive

15 integrations10 featuresVenture (Round not Specified)

Voices Discussing Databricks

Matei Zaharia

CTO at Databricks

29 mentions

Naveen Rao

VP of AI at Databricks

8 mentions

Ion Stoica

Co-founder at Anyscale / Databricks

7 mentions

Latest Videos

Coding Agent Support in Databricks AI Gateway

Coding Agent Support in Databricks AI Gateway

Apr 13, 2026

Gainwell Transforms Health Data with Databricks on AWS

Gainwell Transforms Health Data with Databricks on AWS

Apr 10, 2026

Share:Twitter LinkedIn

Product Screenshots

Databricks screenshot 1

Databricks screenshot 2

Databricks screenshot 3

AI Summary

Users generally praise Databricks for its robust data processing capabilities and seamless integration with popular machine learning libraries, making it a popular choice among data scientists and engineers. However, some users note challenges in managing rapidly changing data and express a desire for more targeted resources for machine learning specialists, as most guidance is tailored more to software engineers. Pricing sentiment varies, with some users hinting at a higher cost that could be a barrier for smaller teams. Overall, Databricks maintains a strong reputation for its innovation in AI and data manipulation, though it could improve user support for specific use cases.

Features & Use Cases

Features

UnifiedScalableLakehouseDelta LakeMachine learningDon’t miss out, last chance to save 50%The Databricks PlatformModern applications need a lakebaseBuild AI agents that work in the real worldIntelligent analytics for all

Use Cases

Ready to start?

Company Intel

Industry

information technology & services

Employees

11,000

Funding Stage

Venture (Round not Specified)

Total Funding

$31.9B

Top Mention

reddit@Similar-Kangaroo-2239 engagement4/27/2026

How I Used Claude Code to build an AI Jobs Globe in One Day

Everyone wants to get into AI but nobody knows where the jobs actually are. So I mapped every AI job I could find onto a 3D globe for it. A3D interactive globe that maps 15,352 AI job openings across 1,144 companies in 41 countries, all posted after February 2026. Here's how I do it with Claude Cowork and Claude Code: # Part 1 — Claude Cowork (research + data pipeline) **Step 1: Ideation + a master list of 1,802 companies** Started with a vague hunch: "everyone knows AI jobs are exploding, nobody knows HOW exploding." Cowork helped me brainstorm into a concrete product, then we curated 1,802 AI companies across 3 reputation tiers (top brands like Google/Amazon, strong companies like Palantir/Databricks, emerging startups), categorized by country, industry, and tier. **Step 2: Scraped 15,352 AI jobs + geocoded 4,682 offices** Cowork wrote scripts using `python-jobspy` to pull listings from Indeed and LinkedIn for all 1,802 companies, handled batch runs, rate limiting, and dedup. For Chinese companies where Western boards don't work, it manually researched 122 entries. Filtered out internships and classified jobs into 4 AI types (technical / upskill / executive / AI-native). Then converted every "Mountain View, CA" string to lat/lon via Nominatim with caching + retry — 4,682 locations geocoded at 100% success. **Step 3: 1,594 company logos + 3-doc PRD with a "SIGINT terminal" design system** Cowork tried multiple logo sources (an open-source library at 16% match → Google Favicons API + DuckDuckGo fallback + manual domain lookups for the obvious ones), ending at 1,594 PNGs. Then wrote a full PRD split into [`frontend.md`](http://frontend.md/) / [`logic.md`](http://logic.md/) / [`data.md`](http://data.md/) covering UI, API, database. I uploaded a screenshot of an app called WORLDVIEW; Cowork created a "SIGINT Terminal" design system — monospace fonts, CRT scanlines, no rounded corners, government-monitoring-screen aesthetic. **Step 4: Supabase + GitHub setup, hand off to Claude Code** Cowork generated SQL schema + Python import script, set up a Supabase project, ran the migration, and imported all 3 tables (companies / offices / jobs) + 2 views — **zero errors across 21K+ rows**. Used Desktop Commander (an MCP that controls your local terminal) to run `gh repo create`, copy 1,594 logos in, commit, push. Handed the 3 PRD files to a fresh Claude Code session. # Part 2 — Claude Code (build + iterate + deploy) **Step 5: Stack pivot before writing a single line** The PRD said Three.js + NASA night-textures, but the visual reference was Bilawal Sidhu's WORLDVIEW. Claude Code researched his actual stack and pushed back: Three.js can't reach that quality — Bilawal uses **CesiumJS + Google Photorealistic 3D Tiles** (the photogrammetric 3D Earth product Bilawal himself helped build at Google Maps). I approved the pivot. The PRD got rewritten on the fly. **Step 6: Wired env + Vercel + GCP, built the whole frontend** Created `.env`, linked a Vercel project via the CLI, added all keys (Supabase + Google Maps API) to all 3 environments, enabled the Map Tiles API on GCP. Then built the entire app in vanilla JS + Vite + Cesium: photoreal 3D globe, 4,682 office spikes as glowing polylines (with city-clustering to fix the "50 SF companies stacking in 1 pixel" problem), full SIGINT chrome — topbar / TARGETS rail / detail panel / stats bar / scope mode. No framework. I never opened a code editor. **Step 7: Tight iteration + deploy loop** Every commit auto-deployed to Vercel in \~30 seconds. I dropped screenshots of whatever was wrong → Claude Code diagnosed, fixed, pushed, deployed, I tested live, repeated. Wired Vercel Web Analytics on both URLs at the end. # What's noteworthy about this workflow * **Every commit auto-deployed.** Screenshot → diagnosis → fix → push → live URL → next screenshot. Tight visual feedback loop, no manual deploys. * **Background agents ran while I worked in the foreground.** The two building-research agents wrote JSON I ingested without breaking flow. When one hit a monthly token cap mid-run, I just re-ran it the next day and merged the output. * **Visual feedback via screenshots was the entire QA loop.** The polyline alone went through 6 width/glow tunings (4 → 7 → 12 → 16 → 8 → 12 px) and a full 3D-cylinder experiment + revert, all driven by me dropping screenshots and Claude reading them. * **I never wrote code.** I'm a CPO, not an engineer. Cesium scene, Supabase queries, Vite config, scope-mode state machine, panel race-guard, pitch deck — all Claude Code. I was the design/PM brain pointing at "this looks wrong, fix that." * **Three SOT documents kept everything coherent.** The PRD drifted hard from the original plan (Three.js → Cesium pivot, scope mode invented mid-build, six pill swaps) but Claude Code maintained dated Recent-Changes logs in all three SOT files. At any point I could read [`frontend.md`](http://frontend.md/) and the deployed site matched Try it here: [https://a

Mentions by Platform

youtube

Databricks AI

Databricks AI

data privacy

youtube

Databricks AI

Databricks AI

youtube

Databricks AI

Databricks AI

data privacy

youtube

Databricks AI

Databricks AI

youtube

Databricks AI

Databricks AI

data privacy

Pricing

tiered

Mention Activity (Last 12 Weeks)

Platform Distribution

Sentiment Overview

Positive6% (1)

Neutral94% (15)

Negative0% (0)

Common Pain Points

API costs (1)

Top Topics

data privacy (4)pricing (1)performance (1)api (1)scalability (1)ease of use (1)support (1)open source (1)migration (1)model selection (1)agents (1)cost optimization (1)workflow (1)

Recent Mentions

youtube

Databricks AI

Databricks AI

data privacy

youtube

Databricks AI

Databricks AI

youtube

Databricks AI

Databricks AI

data privacy

youtube

Databricks AI

Databricks AI

youtube

Databricks AI

Databricks AI

data privacy

reddit@[unknown]6/21/2026

Random ad badges (Samsung, Bajaj Finserv, etc.) getting injected into text inside Claude desktop app, not a browser extension, what is this??

So this is a weird one. I use the Claude desktop app (not the browser version) and for the past little while I've been noticing random little gray badges popping up mid-sentence in Claude's responses, stuff like "Samsung", "Smartprix", "Bajaj Finserv", "Gadgetwiser". They're literally inserted inside the text, like the AI typed a sentence and then someone slapped a little pill-shaped ad tag right in the middle of a word gap. Here's the part that really threw me off. When I first noticed these, I figured maybe it was tied to a phone-shopping conversation I'd had with Claude earlier (was helping my dad pick out a phone under ₹25k), since the badges were brand names like Samsung. But then the exact same badges started showing up on a completely unrelated response, one that was just about how to download notebooks from a Databricks workspace. Nothing to do with phones, shopping, or finance at all. So it's not even consistently topic-matched, it's just inserting these badges somewhat randomly across totally different conversations. I actually pointed this out directly to Claude in the chat and asked why it inserted "Bajaj Finserv" into one of its responses. It flat out said it didn't write that, that the phrase never appeared in its actual response, and that something must have altered the text after it was generated. Which honestly tracks with what I'm seeing, since it really does look like something is injecting these badges into the rendered output rather than Claude actually generating them. Couple things that make this stranger: It's happening in the desktop app, not a browser tab, so I don't think it's a normal Chrome extension doing this (pretty sure Electron apps don't run browser extensions the same way). At first it seemed like it was reacting to content on screen, but since it also showed up on a totally unrelated Databricks response, I'm less sure now whether it's actually context-aware or just cycling through a fixed set of ad badges and dropping them in randomly. I'm now assuming this is some kind of adware or ad injector running at the OS or network level, since it seems to affect content across an app where it really shouldn't be possible. Has anyone run into this before? Any idea what kind of software does this kind of ad injection outside of a browser, and why it would show up in an Electron-based desktop app? I've checked Task Manager and nothing obviously sketchy is jumping out yet, but clearly something is intercepting rendered text somewhere. Would appreciate any help. submitted by /u/Lost-Variation-4522 [link] [comments]

reddit@[unknown]6/16/2026

Executive reports published to Github in html

I work at a major tech company, and one of the things I've started to do is because we have Databricks and other MCPs connected to Claude, we don't have as big of dependency on DS teams to run queries for us to get insights whether it's product metrics or VOC. We can self serve that. I've made a few reports, but I was wondering if anyone has outputs they really like visually that they can share. I'm terrible about prompting the type of design I want because I only know it when I see it, I usually just try to show an example, but I'm looking for maybe some of the best ones you've seen. submitted by /u/AnonBB21 [link] [comments]

reddit@[unknown]6/13/2026

I built all of Anthropic's 7 agent patterns

Anthropic's Building Effective Agents names seven patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer, the augmented LLM, and the autonomous agent. Everyone quotes them, but I wanted to know two things. Do they actually hold up, and how little code does each one really take? So I built all seven as declarative YAML agents (no SDK glue) and wrote 18 automated tests that drive each one and parse the real run transcripts, not screenshots. The 90-second video walks through the actual specs: the unit (a harness + a model + creds), swapping a model in one line, how type: agent sub-agents compile into a running graph, scaling with max_sessions, and putting guardrails (sandbox, cost budget) into the spec as policy instead of begging for them in a prompt. What I found: All 7 patterns work end-to-end. Routing classified and handed off correctly, parallelization fanned out to 3 workers, evaluator-optimizer passed on round 1, and the autonomous agent wrote and verified its own file. The thing I didn't expect: on easy prompts, the agents skipped the machinery and just answered. Simplicity wasn't something I configured, it emerged on its own. The hard part of agent design isn't adding orchestration, its knowing when not to. One real gotcha: the hardened sandbox blocked my autonomous agent from even launching (exit 71) until I scoped its paths, which is kind of the whole point. Control you can't prompt your way around. I ran the whole thing on Omnigent (a meta-harness that sits above Claude Code / Codex / custom agents, open-sourced today: github.com/omnigent-ai/omnigent), with models served through Databricks. Happy to share the specs or the test harness if anyone wants to poke holes in the methodology. submitted by /u/Limp-Park7849 [link] [comments]

reddit@[unknown]6/13/2026

Megathread Summary: I Asked Multiple Reddit Communities How to Build a Living Memory /Context Engine for Business. Here's what everyone had to say.

I am trying to build a living memory/context engine for my business, something that can remember projects, decisions, timelines, risks, and conversations across emails, documents, notes, chats, and meetings. Since this is new territory for me, I asked several Reddit communities for advice. The responses were incredibly thoughtful, and many people shared architectures, engineering trade-offs, tools, and lessons learned from building similar systems. I consolidated the best ideas into a single summary. If you're exploring the same problem, especially if you're just getting started like me, I hope this will help. Core Philosophies & Perspectives Query-First Design: Do not build the storage layer first. Write out 20 real-world queries you will ask tomorrow and architect backward, because the retrieval interface shapes the system more than the storage layer. Chief of Staff vs. Search Engine: The goal is not just retrieving raw data, but synthesis. Like Microsoft Clarity’s bulk insights, the system should process updates and proactively tell you what projects need attention, what changed, and what the blockers are. The "Daily Mirror" Briefing: Focus on what the user needs to know at the start of the next session to continue without context loss, rather than striving for perfect archival completeness. Four Separate Problems: Treating user queries as a single search issue will fail; "latest status" is a retrieval problem, "unresolved issues" is state tracking, "decisions made" is entity extraction, and "important updates" requires significance scoring. Architecture & Strategies Append-Only Event Logs First: Avoid starting with a massive knowledge graph or vector database. Ingest everything as a timestamped, append-only event log, and build the knowledge graph later as a derived query layer on top. Artifact-Mediated Continuity: To prevent identity collapse over long timelines, separate retrieval (facts) from reconstruction (identity and working context). Use a "Principal-owned Artifact System" with files like MEMORY.md for project state, "Texture Packs" for behavior descriptions, and "Lane Files" structured around the Five W's. Parallel Retrieval Paths: Pure vector search fails at scale. Run vector search (for semantic similarity) alongside a graph/relational lookup (for exact entities) in parallel, because neither covers the query surface alone. Hybrid search (semantic + BM25 keyword) is heavily recommended. Split Memory by Lifespan & Namespace: Sector your memory from day one. Split durable facts (stable preferences, user info) from working context (recent events), applying different decay rates and routing queries to the appropriate layer. Continuous Summarization: Instead of treating everything as unstructured documents, use an LLM pipeline to continuously extract structured facts from new inputs to update project briefs, decision logs, and risk trackers automatically. The Hardest Engineering Challenges Entity Resolution (The Silent Killer): Different sources will refer to the same thing differently (e.g., "Project X" vs "the X pilot"). Without an entity registry mapping aliases to canonical IDs before writing, your graph will become a mess of duplicates. Ontology & Classification: The hardest part is often getting the system to universally understand the difference between a "decision", a "discussion", or a "risk" across varying data structures like emails versus meeting transcripts. Temporal Relevance & Stale Context: A "decision" stays load-bearing for months, whereas a "status update" decays in days. If you don't encode decay rates and version records, stale facts will outrank fresh ones and confidently contradict recent updates. Significance Scoring: Standard retrieval returns everything recent, not everything important. Write-time scoring fails because significance is retrospective; a better approach is "adaptive salience," where chunks gain weight when retrieved and decay when ignored. Context Moodiness: Especially in greenfield projects, meaningful status updates can be muddied by confounding, irrelevant, or noisy data. Tools & Tech Stack Recommendations Storage / Databases: Vector stores like pgvector for semantic search, paired with key-value or relational databases for exact lookup. Airtable, Databricks, Notion, and Obsidian were also noted as strong foundational or single-source-of-truth layers. AI Models & Agents: Claude Code, OpenAI Codex, Hermes-agent (by Nous Research), AsanaAI, and ClickUp Brain. Injecting local LLMs where appropriate can help cut down on continuous API costs. Middleware & Pipelines: Kapex: Memory middleware built specifically to score node significance, governing lifecycle so resolved stuff fades and unresolved stuff persists. Sauna.ai: An engine built out of Wordware that fits this use case. Automation: Make.com or n8n for routing deterministic logic and LLM reasoning. The "Party Model": A CRM data integration framework

reddit@[unknown]5/28/2026

AI doesn't have an intelligence problem. AI has a context problem (Is persistent memory a solution !? )

AI doesn't have an intelligence problem. AI has a context problem. This is said by Databricks co-founder and CEO Ali Ghodsi joined Jim Cramer on CNBC's Mad Money to discuss how context is the missing piece for enterprise AI agents to reach their potential. And this is what i am building since 4 months! I launched Graperoot(i built using claude code) in start of march with very messed up code but posted it on reddit and yes, i got so many users. With their feedback and continous talks, i was able to release stable version. TL;DR: Graperoot is a MCP native tool, works with every AI Coding tools. It creates a dependancy graph of your codebase and extract relevant files with zero token usage and dumps that to claude code(This is called Pre-Injection using MCP tools) and it reduces 50-80% of token usage in different scenarios. This is what we have tested ( https://graperoot.dev/benchmarks ) Today, we hit 20k+ installs and on leaderboard( https://graperoot.dev/leaderboard ) a single developer saved $10k in 2 months, i mean it was crazy for me too that the tool i created out of personal frustration is saving actual money. Well, go take a look at https://graperoot.dev It is an free open source tool. Nothing to pay, just give feedback over discord. submitted by /u/intellinker [link] [comments]

reddit@[unknown]5/19/2026

Anthropic Announced vs current compute capacity (Sources Below)

source list: Google Cloud TPU deal — up to 1M TPUs, “well over 1 GW” expected online in 2026 https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services https://www.googlecloudpresscorner.com/2025-10-23-Anthropic-to-Expand-Use-of-Google-Cloud-TPUs-and-Services (Anthropic) Fluidstack / Anthropic $50B U.S. AI infrastructure — Texas + New York, sites coming online through 2026 https://www.anthropic.com/news/anthropic-invests-50-billion-in-american-ai-infrastructure https://www.fluidstack.io/about-us/blog/fluidstack-selected-by-anthropic-to-deliver-custom-data-centers-in-the-us (Anthropic) Microsoft + NVIDIA deal — $30B Azure compute commitment + up to 1 GW additional capacity https://blogs.microsoft.com/blog/2025/11/18/microsoft-nvidia-and-anthropic-announce-strategic-partnerships/ https://blogs.nvidia.com/blog/microsoft-nvidia-anthropic-announce-partnership/ (The Official Microsoft Blog) Google + Broadcom next-gen TPU deal — multiple GW starting 2027; Broadcom SEC filing says ~3.5 GW https://www.anthropic.com/news/google-broadcom-partnership-compute https://investors.broadcom.com/static-files/c906d370-921b-4bc2-bb7b-57877dfcf1ae (Anthropic) Amazon / AWS deal — up to 5 GW, nearly 1 GW by end-2026 https://www.anthropic.com/news/anthropic-amazon-compute (Anthropic) AWS Project Rainier — operational now, nearly half a million Trainium2 chips; Claude expected on 1M+ Trainium2 chips https://www.aboutamazon.com/news/aws/aws-project-rainier-ai-trainium-chips-compute-cluster (Amazon News) SpaceX / Colossus 1 — all Colossus 1 compute, >300 MW, 220k+ NVIDIA GPUs within the month https://www.anthropic.com/news/higher-limits-spacex https://x.ai/news/anthropic-compute-partnership (Anthropic) Independent reporting for SpaceX deal https://www.reuters.com/business/retail-consumer/anthropic-unveils-dreaming-feature-help-its-ai-agents-self-improve-2026-05-06/ (Reuters) submitted by /u/Business_Garden_7771 [link] [comments]

reddit@[unknown]5/18/2026

Tips for BI analysis with Claude? My results so far are shockingly bad compared to general coding

I have a lot of hands-on experience with developing R pipelines to ingest large, live, very dirty datasets and produce relatively straightforward BI-type analyses. Trends, completion rates, revenue etc. I am currently working on a project with a small, live, moderately dirty dataset. The output should be simple analyses eg of lead quality, time to deal, revenue per product line. I am developing this project with Python and DuckDB. I am having incredible difficulty with getting Claude (Code) to coherently do this work, even when taking the pipeline design process step by step. I am always using Opus 4.7 High, and regularly experiencing Claude contradict clear instructions I gave it even within the last 5 minutes. It gives extremely generic names to variables and then very soon will completely misunderstand what the variables mean. It leaps to fixing problems without having any understanding of them and invents generic terminology that disagrees with the established project terms. My hypothesis is that this is an artifact of the data exploration. Inevitably as I explore the dirty data while building this pipeline I'm constantly uncovering new edge cases that need to be accounted for, and I guess this likely pollutes the context very quickly. Likely also Claude is more hesitant to codify "findings" than would be normal in a data pipeline, because it's engineered for more... deterministic (?) programming situations where findings are often meant to be fixed and forgotten. I am planning a few changes to my normal workflow: Much smaller context window, potentially even clearing after every small adjustment to the pipeline Strictly aligning with enterprise-grade standards (eg OpenTelemetry, Databricks Medallions) even for this small project Developing an extremely strict and exhaustively clear variable naming structure so that as Claude writes the tokens for each variable it cannot avoid understanding its meaning (eg medallion___source_module___data_scope___data_qualifiers___stat_type___time_window). Enforce constant linting of 2 and 3 through a hook. Anything else that can be recommended? One thing I'm attempting to do is "go with the flow" and try to figure out what Claude "wants" to do, then strictly codify that... but it seems like most often Claude is just doing random things. Any advice for that? submitted by /u/unwritten734 [link] [comments]

reddit@nikkonine6 engagement4/28/2026

Does Claude create graphic reports from spreadsheet data?

I am often times trying to pull data from spreadsheets and making charts and graphs to better represent the data for others to understand. Does Claude handle this well? I used Databricks for this and it was awesome. I would love to use Claude for this so my monthly cost could be used for other AI purposes than just graph data. I have been on the fence about Claude and Claude code, but I have heard they have removed some of the agentic features.

reddit@Similar-Kangaroo-2239 engagement4/27/2026

How I Used Claude Code to build an AI Jobs Globe in One Day

Everyone wants to get into AI but nobody knows where the jobs actually are. So I mapped every AI job I could find onto a 3D globe for it. A3D interactive globe that maps 15,352 AI job openings across 1,144 companies in 41 countries, all posted after February 2026. Here's how I do it with Claude Cowork and Claude Code: # Part 1 — Claude Cowork (research + data pipeline) **Step 1: Ideation + a master list of 1,802 companies** Started with a vague hunch: "everyone knows AI jobs are exploding, nobody knows HOW exploding." Cowork helped me brainstorm into a concrete product, then we curated 1,802 AI companies across 3 reputation tiers (top brands like Google/Amazon, strong companies like Palantir/Databricks, emerging startups), categorized by country, industry, and tier. **Step 2: Scraped 15,352 AI jobs + geocoded 4,682 offices** Cowork wrote scripts using `python-jobspy` to pull listings from Indeed and LinkedIn for all 1,802 companies, handled batch runs, rate limiting, and dedup. For Chinese companies where Western boards don't work, it manually researched 122 entries. Filtered out internships and classified jobs into 4 AI types (technical / upskill / executive / AI-native). Then converted every "Mountain View, CA" string to lat/lon via Nominatim with caching + retry — 4,682 locations geocoded at 100% success. **Step 3: 1,594 company logos + 3-doc PRD with a "SIGINT terminal" design system** Cowork tried multiple logo sources (an open-source library at 16% match → Google Favicons API + DuckDuckGo fallback + manual domain lookups for the obvious ones), ending at 1,594 PNGs. Then wrote a full PRD split into [`frontend.md`](http://frontend.md/) / [`logic.md`](http://logic.md/) / [`data.md`](http://data.md/) covering UI, API, database. I uploaded a screenshot of an app called WORLDVIEW; Cowork created a "SIGINT Terminal" design system — monospace fonts, CRT scanlines, no rounded corners, government-monitoring-screen aesthetic. **Step 4: Supabase + GitHub setup, hand off to Claude Code** Cowork generated SQL schema + Python import script, set up a Supabase project, ran the migration, and imported all 3 tables (companies / offices / jobs) + 2 views — **zero errors across 21K+ rows**. Used Desktop Commander (an MCP that controls your local terminal) to run `gh repo create`, copy 1,594 logos in, commit, push. Handed the 3 PRD files to a fresh Claude Code session. # Part 2 — Claude Code (build + iterate + deploy) **Step 5: Stack pivot before writing a single line** The PRD said Three.js + NASA night-textures, but the visual reference was Bilawal Sidhu's WORLDVIEW. Claude Code researched his actual stack and pushed back: Three.js can't reach that quality — Bilawal uses **CesiumJS + Google Photorealistic 3D Tiles** (the photogrammetric 3D Earth product Bilawal himself helped build at Google Maps). I approved the pivot. The PRD got rewritten on the fly. **Step 6: Wired env + Vercel + GCP, built the whole frontend** Created `.env`, linked a Vercel project via the CLI, added all keys (Supabase + Google Maps API) to all 3 environments, enabled the Map Tiles API on GCP. Then built the entire app in vanilla JS + Vite + Cesium: photoreal 3D globe, 4,682 office spikes as glowing polylines (with city-clustering to fix the "50 SF companies stacking in 1 pixel" problem), full SIGINT chrome — topbar / TARGETS rail / detail panel / stats bar / scope mode. No framework. I never opened a code editor. **Step 7: Tight iteration + deploy loop** Every commit auto-deployed to Vercel in \~30 seconds. I dropped screenshots of whatever was wrong → Claude Code diagnosed, fixed, pushed, deployed, I tested live, repeated. Wired Vercel Web Analytics on both URLs at the end. # What's noteworthy about this workflow * **Every commit auto-deployed.** Screenshot → diagnosis → fix → push → live URL → next screenshot. Tight visual feedback loop, no manual deploys. * **Background agents ran while I worked in the foreground.** The two building-research agents wrote JSON I ingested without breaking flow. When one hit a monthly token cap mid-run, I just re-ran it the next day and merged the output. * **Visual feedback via screenshots was the entire QA loop.** The polyline alone went through 6 width/glow tunings (4 → 7 → 12 → 16 → 8 → 12 px) and a full 3D-cylinder experiment + revert, all driven by me dropping screenshots and Claude reading them. * **I never wrote code.** I'm a CPO, not an engineer. Cesium scene, Supabase queries, Vite config, scope-mode state machine, panel race-guard, pitch deck — all Claude Code. I was the design/PM brain pointing at "this looks wrong, fix that." * **Three SOT documents kept everything coherent.** The PRD drifted hard from the original plan (Three.js → Cesium pivot, scope mode invented mid-build, six pill swaps) but Claude Code maintained dated Recent-Changes logs in all three SOT files. At any point I could read [`frontend.md`](http://frontend.md/) and the deployed site matched Try it here: [https://a

reddit@[unknown]4/16/2026

Pattern detection agents

Has anyone worked on creating a pattern detection agent of sorts ? i am working on Databricks and there is a lot of data that we work with, often times the data keeps changing weekly i want to create an agent that detects any patterns (example CPT shifting, DRG shifting, etc) in the PHARMA data. the constraint is i don't know what patterns will emerge each week in the new data so there is no start point. how do we create such an agent? submitted by /u/Such_Rush_6956 [link] [comments]

reddit@[unknown]3/29/2026

I built a Claude Code toolkit for ML on Databricks, because all the tips out there are for software engineers, not ML engineers/ML data scientists.

Hey everyone, I've been using Claude Code for ML work on Databricks for a few months now and wanted to share something I put together that might help others in the same boat. What I kept running into If you've looked into Claude Code tips and best practices online, you'll notice almost all of them are geared toward software development: edit code, run tests, ship it. And that's great, but the ML workflow on Databricks is just... different. Your code doesn't run locally. Your laptop is CPU-only but your real training happens on a GPU cluster. You can't just run your script and see if it works, you have to get your code onto the cluster, submit a job, wait, then go fish out the metrics from MLflow. And if you've dealt with DBR 15+ quirks (Workspace path errors, wheel installation changes, stale pydantic caching), you know how much time you can lose on stuff that has nothing to do with your actual model. The thing that bugged me most was that Claude would help me write great training code, and then I'd spend the next 15 minutes manually uploading, submitting, checking results, and copying metrics back so Claude knew what happened. It felt like I was the middleware. What I ended up building Over time I built up a set of Claude Code skills and agents that automate this loop. I finally cleaned them up and put them in a repo in case they're useful to anyone else: github.com/duonginspace/claude-code-databricks-ml The highlights: /run-on-databricks: builds your project as a wheel, uploads to DBFS, submits the job, waits, and pulls MLflow metrics back. One slash command instead of 5 manual steps. /iterate: you say "try adding label smoothing" and Claude implements it, submits to Databricks, pulls results, compares with previous runs, and suggests what to try next. /compare-runs: ranks your experiments, shows what helped and what hurt. /init-databricks-ml: this is the one I wish I had when I started. It scaffolds a complete project with submit/pull scripts, Makefile, MCP config, and all the DBR 15+ workarounds already baked in. /explore-data, /research-papers, /train-local: for the rest of the workflow (EDA, literature search, quick local smoke tests before burning GPU time) There are also 3 agents that the skills delegate to (experiment runner, data analyst, research agent), a /commit command, and a status bar script that shows your context window usage, git branch, and rate limits. What it actually gives you Claude can finally close the loop. It doesn't just write your code and hand it back to you, it submits, tracks, and learns from results. You go from "copilot" to something closer to a junior researcher who can run experiments on their own. You skip the Databricks onboarding tax. The DBR 15+ gotchas alone (DBFS vs Workspace paths, runtime wheel installation, stale module caching, MLflow experiment naming) cost me days to figure out. /init-databricks-ml handles all of it from day one. Faster iteration cycles. Instead of context-switching between your editor, the Databricks UI, and MLflow every time you want to try something, you stay in the terminal. Say "try X" and come back to a comparison table. Your experiments stay organized. Every run gets logged to MLflow automatically, and /compare-runs gives you a ranked summary instead of you eyeballing dashboards. It's easier to spot what's actually working. Less wasted GPU time. /train-local lets you smoke-test on CPU before burning cluster hours, and the skills are structured to catch obvious issues early. It's modular. You don't have to use everything. Install just the one skill you need, or the whole toolkit. They work independently. Install git clone https://github.com/duonginspace/claude-code-databricks-ml.git cd claude-code-databricks-ml bash setup.sh Copies everything to ~/.claude/. MIT licensed. This is very much shaped by my own workflow, so it won't be perfect for everyone. But if you're doing ML on Databricks with Claude Code, or thinking about trying it, I hope it gives you a head start. Would love to hear how others are handling this, and happy to answer any questions. submitted by /u/duongnguyen0512 [link] [comments]

pricingperformanceapiscalability

Integrations

Apache SparkMLflowTableauPower BISnowflakeAWS S3Azure Data Lake StorageGoogle Cloud StorageKafkaDatadogJupyter NotebooksAirflowLookerSalesforceZendesk

Categories

AI/MLFinTechDevOpsSecurityAnalytics

Databricks Alternatives

Compare similar ai-analytics tools

All ai-analytics Tools

Browse the full category

Frequently Asked Questions

How much does Databricks cost?▼

Databricks uses a tiered pricing model. Visit their website for current pricing details.

What are the main features of Databricks?▼

Key features include: Unified, Scalable, Lakehouse, Delta Lake, Machine learning, Don’t miss out, last chance to save 50%, The Databricks Platform, Modern applications need a lakebase.

What is Databricks used for?▼

Databricks is commonly used for: Ready to start?.

What does Databricks integrate with?▼

Databricks integrates with: Apache Spark, MLflow, Tableau, Power BI, Snowflake, AWS S3, Azure Data Lake Storage, Google Cloud Storage, Kafka, Datadog.

What are common complaints about Databricks?▼

Based on user reviews and social mentions, the most common pain points are: API costs.

What is the overall sentiment around Databricks?▼