((((sandro.net))))

quinta-feira, 5 de março de 2026

Show HN: Open dataset of real-world LLM performance on Apple Silicon https://ift.tt/GVNcC9u

Show HN: Open dataset of real-world LLM performance on Apple Silicon Why open source local AI benchmarking on Apple Silicon matters - and why your benchmark submission is more valuable than you think. The narrative around AI has been almost entirely cloud-centric. You send a prompt to a data center, tokens come back, and you try not to think about the latency, cost, or privacy implications. For a long time, that was the only game in town. Apple Silicon - from M1 through the M4 Pro/Max shipping today, with M5 on the horizon - has quietly become one of the most capable local AI compute platforms on the planet. The unified memory architecture means an M4 Max with 128GB can run models that would require a dedicated GPU workstation elsewhere. At laptop wattages. Offline. Without sending a single token to a third party. This shift is legitimately great for all parties (except cloud ones that want your money), but it comes with an unsolved problem: we don't have great, community-driven data on how these machines actually perform in the wild. That's why I built Anubis OSS. The Fragmented Local LLM Ecosystem If you've run local models on macOS, you've felt this friction. Chat wrappers like Ollama and LM Studio are great for conversation but not built for systematic testing. Hardware monitors like asitop show GPU utilization but have no concept of what model is loaded or what the prompt context is. Eval frameworks like promptfoo require terminal fluency that puts them out of reach for many practitioners. None of these tools correlate hardware behavior with inference performance. You can watch your GPU spike during generation, but you can't easily answer: Is Gemma 3 12B Q4_K_M more watt-efficient than Mistral Small 3.1 on an M3 Pro? How does TTFT scale with context length on 32GB vs. 64GB? Anubis answers those questions. It's a native SwiftUI app - no Electron, no Python runtime, no external dependencies - that runs benchmark sessions against any OpenAI-compatible backend (Ollama, LM Studio, mlx-lm, and more) while simultaneously pulling real hardware telemetry via IOReport: GPU/CPU utilization, power draw in watts, ANE activity, memory including Metal allocations, and thermal state. Why the Open Dataset Is the Real Story The leaderboard submissions aren't a scoreboard - they're the start of a real-world, community-sourced performance dataset across diverse Apple Silicon configs, model families, quantizations, and backends. This data is hard to get any other way. Formal chipmaker benchmarks are synthetic. Reviewer benchmarks cover a handful of models. Nobody has the hardware budget to run a full cross-product matrix. But collectively, the community does. For backend developers, the dataset surfaces which chip/memory configurations are underperforming their theoretical bandwidth, where TTFT degrades under long contexts, and what the real-world power envelope looks like under sustained load. For quantization authors, it shows efficiency curves across real hardware, ANE utilization patterns, and whether a quantization actually reduces memory pressure or just parameter count. Running a benchmark takes about two minutes. Submitting takes one click. Your hardware is probably underrepresented. The matrix of chip × memory × backend × thermal environment is enormous — every submission fills a cell nobody else may have covered. The dataset is open. This isn't data disappearing into a corporate analytics pipeline. It's a community resource for anyone building tools, writing research, or optimizing for the platform. Anubis OSS is working toward 75 GitHub stars to qualify for Homebrew Cask distribution, which would make installation dramatically easier. A star is a genuinely meaningful contribution. Download from the latest GitHub release — notarized macOS app, no build required Run a benchmark against any model in your preferred backend Submit results to the community leaderboard Star the repo at github.com/uncSoft/anubis-oss https://ift.tt/63dQRAD March 4, 2026 at 11:44PM

Show HN: Nodepp – A C++ runtime for scripting at bare-metal speed https://ift.tt/Qc4Ky9u

Show HN: Nodepp – A C++ runtime for scripting at bare-metal speed https://ift.tt/ZXn2sYI March 4, 2026 at 11:06PM

quarta-feira, 4 de março de 2026

Show HN: Upload test cases and get automated Playwright tests back https://ift.tt/ZN1xiXa

Show HN: Upload test cases and get automated Playwright tests back We built this service and would love honest feedback. https://instantqa.ai/ March 3, 2026 at 11:23PM

terça-feira, 3 de março de 2026

Show HN: A Puzzle Game Based on Non-Commutative Operations https://ift.tt/Qwusd4h

Show HN: A Puzzle Game Based on Non-Commutative Operations While solving a Skewb[ https://ift.tt/gJIUPiC ] cube I thought it would be interesting to have the subproblems of it presented as puzzle games, one thing lead to another and here is the result. I have definitely some UX problems so looking for feedbacks and thoughts. The best part of this game is, level generation and difficulty analysis can be automated. I have here 15 tested and 5 experimental levels. I enjoy 15th level the most, has an intuitive solution. You can try the competitive mode with a friend, you need to share the link with them. If I can bring the level count to thousands, I will add a ranking system. My mind keep racing about the possibilities, but kind of cannot prioritize at the moment. All kind of feedback, collaboration requests are welcome! https://ift.tt/K3PHZxg March 3, 2026 at 12:20AM

Show HN: Giggles – A batteries-included React framework for TUIs https://ift.tt/LtoCgjI

Show HN: Giggles – A batteries-included React framework for TUIs i built a framework that handles focus and input routing automatically for you -- something born out of the things that ink leaves to you, and inspired by charmbracelet's bubbletea - hierarchical focus and input routing: the hard part of terminal UIs, solved. define focus regions with useFocusScope, compose them freely -- a text input inside a list inside a panel just works. each component owns its keys; unhandled keypresses bubble up to the right parent automatically. no global handler like useInput, no coordination code - 15 UI components: Select, TextInput, Autocomplete, Markdown, Modal, Viewport, CodeBlock (with diff support), VirtualList, CommandPalette, and more. sensible defaults, render props for full customization - terminal process control: spawn processes and stream output into your TUI with hooks like useSpawn and useShellOut; hand off to vim, less, or any external program and reclaim control cleanly when they exit - screen navigation, a keybinding registry (expose a ? help menu for free), and theming included - react 19 compatible! docs and live interactive demos in your browser: https://ift.tt/Mrl3hGp quick start: npx create-giggles-app https://ift.tt/DFcEGUm March 2, 2026 at 11:26PM

segunda-feira, 2 de março de 2026

domingo, 1 de março de 2026

Show HN: Built a tool that turns your GitHub commits into build-in-public posts https://ift.tt/ilL2mAC

Show HN: Built a tool that turns your GitHub commits into build-in-public posts I kept failing at building in public for the same reason every time: not fear of judgment, just the blank page after a long day of shipping. Something always happened. But converting "refactored auth flow" or "fixed that edge case that's been annoying me for a week" into something worth posting felt like a second job on top of the actual job. So I'd skip it. Then skip it again. Then stop entirely. The approach: connect your GitHub, it pulls recent commits and repo activity, and generates draft posts for multiple platforms in your tone — raw founder voice, not content creator polish. The idea is you're always starting from something real you actually did, not staring at a blank box trying to manufacture insight. A few decisions I made consciously: Didn't want to build another scheduler. Hypefury/Typefully solve distribution. This solves the upstream problem: knowing what to say in the first place. Kept the output editable and minimal — 2-3 options per session, short, easy to tweak. Not trying to automate your voice, just unblock it. Free tier to start. Wanted real usage before charging anyone. Still early. Roadmap includes better tone calibration, tighter commit parsing, and more platform targets. But I've been using it daily myself which is the real test. Would love feedback, especially from anyone who's tried and failed at BIP consistency before. https://www.smashlanding.xyz March 1, 2026 at 05:47AM

DJ Sandro

http://sandroxbox.listen2myradio.com