FullStory AI

ai-cdpsession-replaysubscription + freemium + tieredFree tier

Turn your data from a rear-view mirror into a forward-facing guidance system. Fullstory captures complete user behavioral context so your AI stack can

User feedback on "FullStory AI" cites its main strengths as user-friendly interfaces and robust data analysis capabilities. However, there is a lack of specific complaints or detailed user reviews in the available data. Pricing sentiment appears to be neutral with limited insights provided. Overall, while the tool is part of discussions in various contexts, the absence of focused reviews makes it challenging to fully gauge its reputation.

Mentions (30d)

3 this week

Reviews

Platforms

Sentiment

0 positive

Pain Score: 3/10015 integrations10 featuresVenture (Round not Specified)

Latest Videos

Fullstory Guides and Surveys

Feb 18, 2026

Why Engineers Love Behavioral Data #fullstory #productionengineer #productdevelopment

Dec 11, 2025

Share:Twitter LinkedIn

Product Screenshots

AI Summary

Features & Use Cases

Features

Capture everything, miss nothingGet instant AI-powered insightsTurn insights into in-product actionDrive measurable results across the entire customer journeyFaster resolutionReduced frictionIncreased conversionsImproved retentionUnlock deeper customer insightsImprove employee experience

Use Cases

By Industry

Company Intel

Industry

information technology & services

Employees

560

Funding Stage

Venture (Round not Specified)

Total Funding

$195.2M

Top Mention

reddit@bisonbear283 engagement5/13/2026

Opus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks from an Open Source Repo

# TL;DR I ran Opus 4.7 in Claude Code at all reasoning effort settings (low, medium, high, xhigh, and max) on the same 29 tasks from an open source repo (GraphQL-go-tools, in Go). **On this slice, Opus 4.7 did not behave like a model where more reasoning effort had a linear correlation with more intelligence. In fact, the curve appears to peak at medium.** If you think this is weird, I agree! This was the follow-up to a Zod run where Opus also looked non-monotonic. I reran the question on GraphQL-go-tools because I wanted a more discriminating repo slice and didn’t trust the fact that more reasoning != better outcomes. Running on the GraphQL repo helped clarified the result: Opus still did not show a simple higher-reasoning-is-better curve. The contrast is GPT-5.5 in Codex, which overall *did* show the intuitive curve: more reasoning bought more semantic/review quality. That post is here: [https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve](https://www.stet.sh/blog/gpt-55-codex-graphql-reasoning-curve) Medium has the best test pass rate, highest equivalence with the original human-authored changes, the best code-review pass rate, and the best aggregate craft/discipline rate. Low is cheaper and faster, but it drops too much correctness. High, xhigh, and max spend more time and money without beating medium on the metrics that matter. More reasoning effort doesn't only cost more - it changes the way Claude works, but without reliably improving judgment. Xhigh inflates the test/fixture surface most. Max is busier overall and has the largest implementation-line footprint. But even though both are supposedly thinking more, neither produces "better" patches than medium. One likely reason: Opus 4.7 uses adaptive thinking - the model already picks its own reasoning budget per task, so the effort knob biases an already-adaptive policy rather than buying more intelligence. More on this below. An illuminating example is PR #1260. After retry, medium recovered into a real patch. High and xhigh used their extra reasoning budget to dig up commit hashes from prior PRs and confidently declare "no work needed" - voluntarily ending the turn with no patch. Medium and max read the literal control flow and made the fix. One broader takeaway for me: this should not have to be a one-off manual benchmark. If reasoning level changes the kind of patch an agent writes, the natural next step is to let the agent test and improve its own setup on real repo work. *For this post, "equivalent" means the patch matched the intent of the merged human PR; "code-review pass" means an AI reviewer judged it acceptable; craft/discipline is a 0-4 maintainability/style rubric; footprint risk is how much extra code the agent touched relative to the human patch.* I also made an interactive version with pretty charts and per-task drilldowns here: [https://stet.sh/blog/opus-47-graphql-reasoning-curve](https://stet.sh/blog/opus-47-graphql-reasoning-curve) The data: |Metric|Low|Medium|High|Xhigh|Max| |:-|:-|:-|:-|:-|:-| |All-task pass|23/29|28/29|26/29|25/29|27/29| |Equivalent|10/29|14/29|12/29|11/29|13/29| |Code-review pass|5/29|10/29|7/29|4/29|8/29| |Code-review rubric mean|2.426|2.716|2.509|2.482|2.431| |Footprint risk mean|0.155|0.189|0.206|0.238|0.227| |All custom graders|2.598|2.759|2.670|2.669|2.690| |Mean cost/task|$2.50|$3.15|$5.01|$6.51|$8.84| |Mean duration/task|383.8s|450.7s|716.4s|803.8s|996.9s| |Equivalent passes per dollar|0.138|0.153|0.083|0.058|0.051| # Why I Ran This After my last post comparing GPT-5.5 vs 5.4 vs Opus 4.7, I was curious how intra-model performance varied with reasoning effort. Doing research online, it's very very hard to gauge what *actual experience* is like when varying the reasoning levels, and how that applies to the work that I'm doing. I first ran this on Zod, and the result looked strange: tests were flat across low, medium, high, and xhigh, while the above-test quality signals moved around in mixed ways. Low, medium, high, and xhigh all landed at 12/28 test passes. But equivalence moved from 10/28 on low to 16/28 on medium, 13/28 on high, and 19/28 on xhigh; code-review pass moved from 4/27 to 10/27, 10/27, and 11/27. That was interesting, but not clean enough to make a default-setting claim. It could have been a Zod-specific artifact, or a sign that Opus 4.7 does not have a simple "turn reasoning up" curve. So I reran the question on GraphQL-go-tools. To separate vibes from reality, and figure out where the cost/performance sweet spot is for Opus 4.7, I wanted the same reasoning-effort question on a more discriminating repo slice. This is not meant to be a universal benchmark result - I don't have the funds or time to generate statistically significant data. The purpose is closer to "how should I choose the reasoning setting for real repo work?", with `GraphQL-Go-Tools` as the example repo. Public benchmarks flatten the reviewer question that most SWEs actually care about: would I a

FullStory AI

Compare FullStory AI With