We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Users praise "Phi" for its robust community support and seamless integration with the Hugging Face ecosystem, making it a popular tool for leveraging machine learning models. Key strengths include lowering the barriers to entry in machine learning and efficient handling of extensive repositories of models. Some users express concern over the complexity of integrating large models and the occasional steep learning curve. Pricing sentiment appears positive, as many features are freely accessible, contributing to its strong reputation as a valuable open-source resource in the ML community.
Mentions (30d)
4
Avg Rating
4.0
1 reviews
Platforms
7
Sentiment
11%
13 positive
Users praise "Phi" for its robust community support and seamless integration with the Hugging Face ecosystem, making it a popular tool for leveraging machine learning models. Key strengths include lowering the barriers to entry in machine learning and efficient handling of extensive repositories of models. Some users express concern over the complexity of integrating large models and the occasional steep learning curve. Pricing sentiment appears positive, as many features are freely accessible, contributing to its strong reputation as a valuable open-source resource in the ML community.
Features
Use Cases
Industry
information technology & services
Employees
720
Funding Stage
Series D
Total Funding
$395.7M
Welcome to @OpenAI on @huggingface! https://t.co/HFjGP6RtjU
Welcome to @OpenAI on @huggingface! https://t.co/HFjGP6RtjU
View originalg2
What do you like best about Phi?The model is highly efficient for its size, outperform many models of its size. It is also cost effictive. It is available via microsft azure where they integrate well with tools. Review collected by and hosted on G2.com.What do you dislike about Phi?May not perform well as larger models like gpt 4 for complex task. Review collected by and hosted on G2.com.
We keep saying AI "understands" things. Does it? Or are we just pattern-matching our own anthropomorphism?
Every week there's a new paper or tweet claiming some model "understands" context, "reasons" about math, or "knows" what it doesn't know. But when you look closely, there's almost no consensus on what "understanding" even means — philosophically or empirically. Searle's Chinese Room argument is 40 years old and still hasn't been cleanly resolved. The "stochastic parrot" framing treats token prediction as the ceiling. Integrated Information Theory would say current architectures are near-zero in phi. And yet GPT-4 passes the bar exam. A few questions I've been sitting with: Is "understanding" even the right frame — or is it a folk-psychology term we're forcing onto a system that operates on completely different principles? Does it matter if a model "truly understands" if the outputs are indistinguishable from someone who does? Are we anthropomorphizing because it's useful shorthand — or because we genuinely don't have better language yet? I've been going deep on AI + philosophy of mind for a channel I run (@ContextByRaj on YouTube if you're into this space). But genuinely curious what this community thinks — especially people coming from ML or cognitive science backgrounds. Where do you land on this? submitted by /u/rajzzz_0 [link] [comments]
View originalThe Frontier-Only Narrative Is a Financing Story, Not an Architecture Story
The frontier-only narrative is an artifact of how AI infrastructure is being financed, not how production systems are being built. The setup. Q1 2026 disclosed $112B in hyperscaler capex in a single quarter, $650–725B in 2026 guidance, and Alphabet's first 100-year bond by a tech company since Motorola 1997 (see a0109). The story that underwrites that paper is: every query needs a bigger model. The architecture says the opposite. Microsoft's Phi-4 (14B parameters) exceeds its teacher GPT-4o on graduate STEM and competition math. Phi-4-reasoning is competitive with DeepSeek-R1 at roughly one-forty-eighth the parameter count. Claude Haiku 4.5 is positioned by Anthropic and AWS for "economically viable agent experiences." None of this is a benchmark teaser — it is the production toolkit, available today. Routing is the missing component. RouteLLM (UC Berkeley, Anyscale) demonstrated over 2x cost reduction without sacrificing response quality. AWS Bedrock Intelligent Prompt Routing — generally available, official, supported — claims up to 30% cost reduction within a single model family without compromising accuracy. The Flagship Tax (see a0085) didn't just die; it left a vacancy at the architecture layer. The bookkeeping nobody wants to do. Operator audits suggest 40–60% of token budgets in production LLM applications are waste, dominated by default-to-frontier routing. Roughly 37% of enterprises with production AI workloads run five or more models in their stack. The rest are still defaulting to one. Why the story isn't being told. Hundred-year bonds don't pencil out on "use less compute per query." They pencil out on "every query needs a bigger model." The opacity in the harness (see a0107) is the symptom; the underwriting is the disease. What you do Monday morning. Treat model selection as a dependency-graph decision, not a vendor decision. Add a complexity classifier. Default to small. Cascade up when verification fails. Instrument model-mix as a first-class production metric. Bottom line. You are not behind because you have not bought the biggest model. You are behind because you have not built the router. submitted by /u/gastao_s_s [link] [comments]
View originalImages 2.0 can generate this much movie-quality detail (Benjamin Poindexter concept)
Also, LIFE HACK: I find it is better to ask GPT to create the concept first as a text response (so that it is really elaborate) and then ask it to generate the image after, instead of asking it to generate an image with the idea from the get go. I asked GPT to create a fan made concept (movie screenshots collage style) of a Day in the Life of a Netflix show’s character (Benjamin Poindexter aka Bullseye from Netflix’s Daredevil). And then I told it to generate it based on its idea. I did not influence a thing. I told it to come up with the idea and to generate it. submitted by /u/nikkomercado [link] [comments]
View originalFirst time fine-tuning, need a sanity check — 3B or 7B for multi-task reasoning? [D]
Ok so this is my first post here, been lurking for a while. I’m about to start my first fine-tuning project and I don’t want to commit to the wrong direction so figured I’d ask. Background on me: I’m not from an ML background, self-taught, been working with LLMs through APIs for about a year. Hit the wall where prompt engineering isn’t enough anymore for what I’m trying to do, so now I need to actually fine-tune something. Here’s the task. I want the model to learn three related things: First, reading what’s actually going on underneath someone’s question. Like, when someone asks “should I quit my job” the real question is rarely about the job, it’s about identity or fear or something else. Training the model to see that underneath layer. Second, holding multiple perspectives at once without collapsing to one too early. A lot of questions have legitimate different angles and I want the model to not just pick one reflexively. Third, when the input is messy or has multiple tangled problems, figuring out which thread is actually the load-bearing one vs what’s noise. These three things feel related to me but they’re procedurally different. Same underlying skill (reading what’s really there) applied three ways. So the actual question: is 3B enough for this or do I need 7B? Was thinking Phi-4-mini for 3B or Qwen 2.5 7B otherwise. I have maybe 40-60k training examples I can generate (using a bigger model as teacher, sourcing from philosophy, psych case studies, strategy lit). Hardware is M4 Mac with 24gb unified. 3B fits comfortably with LoRA, 7B is tight but doable. Happy to rent gpu if needed. What I’m actually worried about: • Can 3B hold three related reasoning modes without confusing them on stuff that’s outside the training distribution • Does the “related but not identical” thing make this harder to train than if they were totally separate tasks • What do I not know that’s gonna bite me Not really looking for “just try both” type answers. More interested if anyone has actually done multi-task training on reasoning-ish data at this scale and can tell me where it went sideways. Any pointers appreciated, even just papers to read if the question is too vague. submitted by /u/retarded_770 [link] [comments]
View originalKIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache [P]
Been working on this for a bit and figured it was ready to share. KIV (K-Indexed V Materialization) is a middleware layer that replaces the standard KV cache in HuggingFace transformers with a tiered retrieval system. The short version: it keeps recent tokens exact in VRAM, moves old K/V to system RAM, and uses K vectors as a search index to pull back only the ~256 most relevant V entries per decode step. Results on a 4070 12GB with Gemma 4 E2B (4-bit): 1M tokens, 12MB KIV VRAM overhead, ~6.5GB total GPU usage 4.1 tok/s at 1M context (8-10 tok/s on GPU time), 12.9 tok/s at 4K 70/70 needle-in-haystack tests passed across 4K-32K Perfect phonebook lookup (unique names) at 58K tokens Prefill at 1M takes about 4.3 minutes (one-time cost) Decode is near-constant regardless of context length The core finding that makes this work: K vectors are smooth and structured, which makes them great search indices. V vectors are high-entropy and chaotic, so don't try to compress them, just retrieve them on demand. Use K to decide which V entries deserve to exist in VRAM at any given step. No model weights are modified. No retraining or distillation. It hooks into the HuggingFace cache interface and registers a custom attention function. The model has no idea it's talking to a tiered memory system. Works with any model that uses DynamicCache. Tested on Gemma 4, Qwen2.5, TinyLlama, and Phi-3.5 across MQA/GQA/MHA. There are real limitations and I'm upfront about them in the repo. Bounded prefill loses some info for dense similar-looking data. Collision disambiguation doesn't work but that's the 4-bit 2B model struggling, not the cache. Two-hop reasoning fails for the same reason. CPU RAM scales linearly (5.8GB at 1M tokens). Still actively optimizing decode speed, especially at longer contexts. The current bottleneck is CPU-to-GPU transfer for retrieved tokens, not the model itself. Plenty of room to improve here. GitHub: github.com/Babyhamsta/KIV (can be installed as a local pip package, no official pip package yet) Happy to answer questions about the architecture or results. Would love to see what happens on bigger models with more VRAM if anyone wants to try it. submitted by /u/ThyGreatOof [link] [comments]
View originalAdditive vs Reductive Reasoning in AI Outputs (and why most “bad takes” are actually mode mismatches)
Additive vs Reductive Reasoning in AI Outputs (and why most “bad takes” are actually mode mismatches) A lot of disagreement with AI assistants isn’t about facts, it’s about reasoning mode. I’ve started noticing two distinct output behaviors: Additive Mode (local caution stacking) The model evaluates each component of an argument separately: • “this signal is not sufficient” • “this metric is noisy” • “this claim is unproven” • “this inference may not hold” Individually, these are correct. But collectively, they produce something distorted: A fragmented critique that never resolves into a single judgment. This is what people often experience as “nitpicky” or overly cautious. ⸻ Reductive Mode (global synthesis) Instead of evaluating each piece in isolation, the model compresses everything into a single integrated judgment: • What is the net direction of the evidence? • What interpretation survives all constraints simultaneously? • What is the simplest coherent explanation of the full set? This produces: A single structured conclusion with minimal internal fragmentation. ⸻ Example: AI “bubble” narrative (2025) Additive response • Repo activity ≠ systemic stress alone • Capex ≠ guaranteed ROI • Adoption ≠ uniform profitability → Therefore no strong conclusion possible Result: feels evasive, overqualified, disconnected. ⸻ Reductive response • Liquidity signals are weak structural predictors • Capex + infrastructure buildout is strong directional signal • Adoption trajectory confirms ongoing diffusion phase Net conclusion: “bubble pop” framing over-weighted financial noise and under-weighted structural deployment dynamics. Result: coherent macro interpretation. ⸻ Key insight Most disagreements with AI assistants come from mode mismatch, not disagreement about facts. • Users often ask for global interpretation • Models often respond with local epistemic audits ⸻ Implication Better calibration isn’t “more cautious vs more confident.” It’s: selecting the correct reasoning mode for the level of abstraction being requested. ⸻ Formalization (lightweight, usable) We can define this cleanly: Two output modes Additive Mode (A-mode) A reasoning process where: • Each evidence component e\_i is evaluated independently • Output structure is: O_A = \sum f(e_i) Properties: • high local correctness • low global resolution • tends toward caveated or non-committal conclusions ⸻ Reductive Mode (R-mode) A reasoning process where: • Evidence is integrated before evaluation • Output structure is: O_R = g(e_1, e_2, ..., e_n) Properties: • produces single coherent interpretation • higher risk of overcompression if poorly constrained • better for macro claims and narrative synthesis ⸻ Calibration function (the useful part) We can define mode selection as: M = \phi(Q, C, S) Where: • Q = question type (local vs global inference) • C = context complexity • S = stakes / need for precision Heuristic: • If Q = decomposition → use additive mode • If Q = interpretation → use reductive mode ⸻ submitted by /u/Harryinkman [link] [comments]
View originalBuilt a HIPAA compliant app w Claude!
Edit: I built a demo that's fully compliant -- full disclosure, I work at Xano. I love the product so much that I build independently all the time, check my profile! I recently worked on a project that was for the healthcare world. The project itself was a simple internal management system. What makes this unique is that it was nocode. For those that don't know, healthcare applications require compliance with HIPAA. Essentially, make your application secure. I used Bolt for the frontend and Xano for the backend. (First time using Bolt, but I'm experienced with Xano!!) We encrypted the db fields that were identified as PHI and we decrypted them when queried. We had RBAC middleware. Audit logs. All the compliance hoops. It was a lot, but in the age of AI, it's only getting easier to build. What I found interesting is that in the build process, Claude 4.6, while building on Xano, used conditional if statements more than I would have. For the en/decryption aspect, we pass in a string and return the respective value. It's either decrypted and readable, or it's decrypted and needs to be encrypted. For the individual fields of the records, Claude constructed a system to update the response var property by property. It checked if the title was empty, the name was empty, etc. Nothing wrong with robust checks. This is is somewhat appreciated. It's just a lot of looping and not wholly necessary. Instead, I would have just an expression and filters. Regardless, with minor prompting and construction, anything's possible. We also wrote our own unit tests using CC outside of Xano, although Xano does support testing and test suites of its own. Let me know if you have any questions on the app build, or what took the longest, etc. Just wanted to share that this was my first HIPAA build that I can now add to the books! submitted by /u/Dazzling_Abrocoma182 [link] [comments]
View originalllama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M openclaw onboard --non-interactive \ --auth-choice custom-api-key \ --custom-base-url "http://127.0.0.1:8080/v1" \ --custom-model-id "gg
llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M openclaw onboard --non-interactive \ --auth-choice custom-api-key \ --custom-base-url "http://127.0.0.1:8080/v1" \ --custom-model-id "ggml-org-gemma-4-26b-a4b-gguf" \ --custom-api-key "llama.cpp" \ --secret-input-mode plaintext \ --custom-compatibility openai \ --accept-risk
View original@LottoLabs https://t.co/h2frA6iR2I
@LottoLabs https://t.co/h2frA6iR2I
View originalLet's go! https://t.co/HakmkNzDT2
Let's go! https://t.co/HakmkNzDT2
View originalModel weights are here: https://t.co/rQlfP51Db7!
Model weights are here: https://t.co/rQlfP51Db7!
View originaldo the right thing anon!
do the right thing anon!
View originalDoes System Architecture Affect Consciousness-Like Behavior in LLMs?
Not a philosophical essay. A practical question for developers building AI systems. Why...
View originalPhi uses a tiered pricing model. Visit their website for current pricing details.
Phi has an average rating of 4.0 out of 5 stars based on 1 reviews from G2, Capterra, and TrustRadius.
Key features include: memory/compute constrained environments;, latency bound scenarios;, strong reasoning (especially math and logic)., Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated., Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case., Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations., Inputs: Text. It is best suited for prompts using chat format., Context length: 4K tokens.
Phi is commonly used for: Customer support chatbots, Content generation for blogs and articles, Code generation and debugging assistance, Educational tutoring systems, Creative writing and storytelling, Data analysis and report generation.
Sarah Guo
Founder at Conviction
2 mentions
Phi integrates with: Azure AI Studio, Hugging Face Model Hub, Slack for team collaboration, Discord for community engagement, Jupyter Notebooks for data science, Web applications via REST APIs, Chatbot frameworks like Rasa, Voice assistants integration.
Based on user reviews and social mentions, the most common pain points are: usage monitoring, API costs, spending too much, breaking.
Based on 118 social mentions analyzed, 11% of sentiment is positive, 84% neutral, and 5% negative.