((((sandro.net))))

quinta-feira, 26 de março de 2026

Show HN: Robust LLM Extractor for Websites in TypeScript https://ift.tt/dUWETFb

Show HN: Robust LLM Extractor for Websites in TypeScript We've been building data pipelines that scrape websites and extract structured data for a while now. If you've done this, you know the drill: you write CSS selectors, the site changes its layout, everything breaks at 2am, and you spend your morning rewriting parsers. LLMs seemed like the obvious fix — just throw the HTML at GPT and ask for JSON. Except in practice, it's more painful than that: - Raw HTML is full of nav bars, footers, and tracking junk that eats your token budget. A typical product page is 80% noise. - LLMs return malformed JSON more often than you'd expect, especially with nested arrays and complex schemas. One bad bracket and your pipeline crashes. - Relative URLs, markdown-escaped links, tracking parameters — the "small" URL issues compound fast when you're processing thousands of pages. - You end up writing the same boilerplate: HTML cleanup → markdown conversion → LLM call → JSON parsing → error recovery → schema validation. Over and over. We got tired of rebuilding this stack for every project, so we extracted it into a library. Lightfeed Extractor is a TypeScript library that handles the full pipeline from raw HTML to validated, structured data: - Converts HTML to LLM-ready markdown with main content extraction (strips nav, headers, footers), optional image inclusion, and URL cleaning - Works with any LangChain-compatible LLM (OpenAI, Gemini, Claude, Ollama, etc.) - Uses Zod schemas for type-safe extraction with real validation - Recovers partial data from malformed LLM output instead of failing entirely — if 19 out of 20 products parsed correctly, you get those 19 - Built-in browser automation via Playwright (local, serverless, or remote) with anti-bot patches - Pairs with our browser agent (@lightfeed/browser-agent) for AI-driven page navigation before extraction We use this ourselves in production at Lightfeed, and it's been solid enough that we decided to open-source it. GitHub: https://ift.tt/qrLFi7f npm: npm install @lightfeed/extractor Apache 2.0 licensed. Happy to answer questions or hear feedback. https://ift.tt/qrLFi7f March 26, 2026 at 12:55AM

Show HN: Nit – I rebuilt Git in Zig to save AI agents 71% on tokens https://ift.tt/7GLcmXE

Show HN: Nit – I rebuilt Git in Zig to save AI agents 71% on tokens https://ift.tt/42DLcYt March 26, 2026 at 12:14AM

quarta-feira, 25 de março de 2026

Show HN: PSFuturemail – Write a letter and forget it until it arrives https://ift.tt/QuIBPZY

Show HN: PSFuturemail – Write a letter and forget it until it arrives I wanted a way to write letters to my future self and actually forget about them until they arrive. Existing options didn't quite work for me. While Gmail lets you schedule emails, seeing those drafts every time was tempting, breaking the surprise. FutureMe has moved to a paid model, and most alternatives either lack encryption, feel limited, or don’t allow editing after scheduling. So I built PSFutureMail. It's a simple web app where you can write a letter and choose a delivery date anywhere from days to decades in the future. Letters are private and encrypted by default. You can edit or delete them at any time before delivery. Attachments are supported as well. There's also an option to publish letters anonymously so that others can read them. The core idea is to make it easy to write something to your future self in the best possible way. Would love feedback: https://ift.tt/Ql2WVPK https://ift.tt/FImkTz5 March 25, 2026 at 05:48AM

Show HN: Necessary Cuts – an interactive fiction fragment https://ift.tt/urRMP3i

Show HN: Necessary Cuts – an interactive fiction fragment I'm a platform engineer who wrote a literary novella that's getting published next year. I've also had an interest in interactive narratives for a while, so I decided to experiment. This was also heavily inspired by "playing" through Betwixt. The result is a short interactive fragment (~5 minutes) — ambient audio synced to prose, three scenes. I didn't want to build a true game as much as make an attempt at immersion. So I re-wrote a fragment set in the world of the book, and wired up the computer-y bits to it. The fragment is in second person (the novella isn't), because other POVs don't really work with the immersion angle. For the technically interested, this is just vanilla JS and Web Audio API, no frameworks - as is the way of my people. https://ift.tt/0SoAjC4 March 25, 2026 at 01:00AM

Show HN: DuckDB community extension for prefiltered HNSW using ACORN-1 https://ift.tt/meMuSZl

Show HN: DuckDB community extension for prefiltered HNSW using ACORN-1 Hey folks! As someone doing hybrid search daily and wishing I could have a pgvector-like experience but with actual prefiltered approximate nearest neighbours, I decided to just take a punt on implementing ACORN on a fork of the DuckDB VSS extension. I had to make some changes to (vendored) usearch that I'm thinking of submitting upstream. But this does the business. Approximate nearest neighbours with WHERE prefiltering. https://ift.tt/IiTWOyd March 25, 2026 at 12:28AM

terça-feira, 24 de março de 2026

Show HN: Danube – AI Tools Marketplace https://ift.tt/ALIMKV2

Show HN: Danube – AI Tools Marketplace Hey HN, I built Danube, a marketplace where AI agents can discover and execute tools, and where developers can publish and monetize them. I got tired of two things: giving my API keys directly to agents like OpenClaw (didn't feel secure), and having to re-setup all my MCP servers every time I switched between Cursor, Claude Code, and other tools. Danube stores your credentials securely. Your agent calls the tool and never sees the keys. And since it's one MCP connection, you set it up once and it works across all your clients. For devs who want to publish: you upload an OpenAPI spec or MCP server, optionally set pricing, and you're live. Agents can search and find your tools without users needing to manually configure anything. 100+ services work today. No signup required to browse. Would love to hear what tools you use most with AI agents. If anyone's interested in publishing a tool, happy to help get you set up. https://danubeai.com March 24, 2026 at 08:41AM

Show HN: Kern – One agent. One folder. One mind. Every channel https://ift.tt/3nVJOHE

Show HN: Kern – One agent. One folder. One mind. Every channel https://ift.tt/YSAsXk1 March 24, 2026 at 04:44AM

DJ Sandro

http://sandroxbox.listen2myradio.com