ads

Sunday, May 31, 2026

Saturday, May 30, 2026

Show HN: UN Condemnation Statistics https://ift.tt/IqV7oHy

Show HN: UN Condemnation Statistics https://boxed.github.io/UN-condemns/ May 30, 2026 at 11:57PM

Show HN: Community Ninja – Find customers searching for your product https://ift.tt/DGJFBxE

Show HN: Community Ninja – Find customers searching for your product https://ift.tt/ACT9oHX May 30, 2026 at 11:57PM

Show HN: Ego lite – why our browser agent writes JavaScript not CLI commands https://ift.tt/1a6mIBr

Show HN: Ego lite – why our browser agent writes JavaScript not CLI commands https://ift.tt/o2hd8i9 May 30, 2026 at 11:03PM

Friday, May 29, 2026

Show HN: Oort – A prompt library where every listing has a shipped project https://ift.tt/8kivAMF

Show HN: Oort – A prompt library where every listing has a shipped project https://oortstack.com May 29, 2026 at 11:06PM

Show HN: Promptloop – create, run, and improve prompt evals from the terminal https://ift.tt/ctGrXNZ

Show HN: Promptloop – create, run, and improve prompt evals from the terminal a CLI agent for prompt evaluation loopsw https://ift.tt/rge0nHZ May 29, 2026 at 11:06PM

Thursday, May 28, 2026

Show HN: Beacon CLI for self-hosted monitoring, remote access and deployments https://ift.tt/wqfHxiZ

Show HN: Beacon CLI for self-hosted monitoring, remote access and deployments I've been building a cli for my homelab/self-hosted setup. The original motivation was getting tired of stitching together deployments, monitoring, remote SSH access and random scripts. It’s a open source CLI, all-in-one tool for monitoring, secure remote accces (tunnel and terminal), log forwarding, automated deployments. No exposed ports, or endpoint, everything is configurable how the user wants it. One use case that multiple friends are interested in is replacing HomeAssistant’s Nabu Casa subscription with tunneling through BeaconInfra. The cloud/control-plane part (BeaconInfra) is optional. The agent itself is intended to stay local-first and continue functioning offline. Example CLI output: ⬡ beacon 0.5.2 ● master running uptime 29d 14h DEVICE bajszi-MINI-S amd64 Ubuntu 25.04 PROJECTS 2 healthy 0 warning 0 down ● beaconinfra 1/1 checks passing ● mestertkeresek 1/1 checks passing TUNNELS ● homeassistant connected https://ift.tt/IjKmCaq May 28, 2026 at 09:12PM

Hallucinate – Massively Multiplayer Online Rave https://ift.tt/4Guq1iF

Hallucinate – Massively Multiplayer Online Rave https://hallucinate.site May 28, 2026 at 10:50AM

Wednesday, May 27, 2026

Show HN: Hodor – a 701KB native macOS prompt launcher for AI tools https://ift.tt/usmvrHy

Show HN: Hodor – a 701KB native macOS prompt launcher for AI tools Hodor is a tiny macOS app that launches saved AI prompts into any text field — from the screen edge, keyboard shortcut, or keyword such as ;git. I work with different AI tools every day, and had prompts scattered across Raycast snippets, Apple Notes, and Notion — notes that kept getting longer and unmanageable. Raycast snippets are useful, but cumbersome to browse and edit. I wanted one local place to save and review them, and one click to paste them into whatever AI tool I'm using. The test I set was whether I could actually stop using Raycast snippets for this. I think I fulfilled my goal. Hodor has been my daily tool for 3 months now. The app is 701 KB — SwiftUI + SwiftData, no web views. Zero network requests anywhere in the code: no analytics, no telemetry, no update checks. You can verify - search the source for URLSession, it's not in code. Runs on macOS 15+, with native Liquid Glass on macOS 26+. Free and open source. GitHub: https://ift.tt/vyOGSHQ Let me know if you have any suggestions — I'd love to hear how you solved the scattered-prompts problem. https://hodor.design May 27, 2026 at 11:30PM

Show HN: Demon – open-source real-time music diffusion engine, 25Hz local GPU https://ift.tt/VOa1Qys

Show HN: Demon – open-source real-time music diffusion engine, 25Hz local GPU YO, I’m Ryan, lead author. I’ve been contributing open source generative audio stuff for a while now, audio reactive Comfy nodes, extended ACEstep support in Comfy, etc.. I just opened-sourced a new audio project that I've been working on for a few months and I want to tell y'all about it. WHAT IS IS DEMON: Diffusion Engine for Musical Orchestrated Noise This is StreamDiffusion but with audio instead of images, and ACEStep 1.5 instead of Stable Diffusion. It’s responsive enough that you can play it like an instrument, and remix in near real-time. I also distilled the ACEStep VAE: it’s faster at the expense of some quality. I also trained something like 200 lora/dora for ACEStep 1.5 and 1.5XL: I will release these in batches of 5 or 10 or something WHY IT IS Two reasons: 1) Making music is an inherently real-time activity 2) Why not bro SOME RUNTIME CAPABILITIES -Real-time remixing of songs -Denoise, structure, timbre strength adjustment -Reference track swapping -Prompt blending, parameter scheduling with curves -LoRA hotswapping, runtime strength adjustment -Latent channel (research preview) -Feedback -Vocal stem cutting/pasting with melformer (s/o u/BuffMcBigHuge) -XL support (its less stable, working out VRAM pressure issues and whatnot) -Lyrics/vocals SOON -Spectral quality SOON -Other stuff SOME LIMITATIONS -ACEStep (correctly) ‘begins’ and ‘ends’ the song. This system is optimized for remixing either an entire song, or continuously remixing a loop. The loop works fine, but this is not pure, continuous music. Autogression wins here. -Many others, for a more exhaustive list, please see the full writeup via the project page -Please let us know if you find any, I would love to try and address them if possible LINKS My YouTube (DEMON tutorial): https://youtu.be/FBv1b5gmjcE Github: https://ift.tt/NlJWkQw Project page: https://daydreamlive.github.io/DEMON LoRA: https://ift.tt/Go1tPR9 DreamVAE: https://ift.tt/gY0UWu3 Try it w/o installing: https://music.daydream.live https://daydreamlive.github.io/DEMON/ May 27, 2026 at 11:17PM

Show HN: OAuth 2.0 framework for MCP servers https://ift.tt/75FeMfW

Show HN: OAuth 2.0 framework for MCP servers https://ift.tt/Q8KRH2w May 27, 2026 at 11:35PM

Show HN: Gochan – A library of channel architectures for Go, inspired by Rust https://ift.tt/CwFkzcq

Show HN: Gochan – A library of channel architectures for Go, inspired by Rust Hi All, I felt like I was re-inventing the wheel by bolting similar channel architectures onto different Go structs repeatedly so I decided to extract some common types into into one library so that they would be easier to re-use: - oneshot - spsc - spmc - mpsc - mpmc - broadcast - watch The types are inspired by Rust channels so if you're coming from Rust they should feel familiar. So far I'm really enjoying using them but it'd be great to get some external feedback if you have time! https://ift.tt/Quix647 Andres https://ift.tt/Quix647 May 27, 2026 at 11:06PM

Tuesday, May 26, 2026

Show HN: NeuroFlow 55.8x video inference speedup for Vision Transformers PyTorch https://ift.tt/dWvnj52

Show HN: NeuroFlow 55.8x video inference speedup for Vision Transformers PyTorch https://ift.tt/bj86Mzl May 26, 2026 at 11:04PM

Show HN: WYSIWYG markdown editor for any GitHub repo https://ift.tt/JDHcmaX

Show HN: WYSIWYG markdown editor for any GitHub repo replace any github.com url with dunkdown.com https://ift.tt/Vvz5Lde https://dunkdown.com May 26, 2026 at 08:47PM

Monday, May 25, 2026

Show HN: Write your BPF programs in Go, not C https://ift.tt/njBJQhD

Show HN: Write your BPF programs in Go, not C https://ift.tt/98lOnt5 May 21, 2026 at 11:25PM

Show HN: I made Pokémon but with real animals in the real world https://ift.tt/ZCE5nAw

Show HN: I made Pokémon but with real animals in the real world Firstly, apologies, it's not free. It would be difficult to support this for free, it's a paid game. I will now share the technical details, which will probably be most of interest for HN readers. I previously made a carbon footprint tracking app where you photo objects and it tells you the carbon footprint by using an LLM to estimate the data on the fly, e.g. 32kg CO2e / kg of beef, in the UK. At some point, I realised that it is possible to make a Pokémon-style game, but capturing real animals in the real world. This is now possible because: - image recognition is cheap, i.e. identifying animals, and the models (gpt-4o) can detect a (surprisingly) large number of animals and output their exact species. - LLMs can output a species' full taxonomy, pretty reliably. And, more importantly, they can generate game data quickly, on the fly. It would unfeasible to generate the game sprites (images) for every species (millions, worldwide) and their full evolution chain, e.g. caterpillar, chrysalis, butterfly, ahead of time. I realised it's possible to do this in real time. General game flow: - photo animal - send to gpt-4o - return species - send species to LLM, create evolution chain, plus attributes, types and moves. - in parallel, create sprites. All data is cached. The aim of the game is to build up your team and compete with other players to take over gyms. The game is based in the real world, I had to come up with a way to have health centres and shops. These must both have decent coverage, globally. The solution is health centres are places of worship, e.g. churches, mosques, temples etc and shops are real world grocery stores. Every country as far as I can tell has places of worship, with good distribution, which was surprising. Gyms are located in every park worldwide. Challenges: How to get players outside: - I use openstreetmap for the game map, but I overlay my game design on top of it. - To physically make players go out into nature: I use openstreetmap area types to only allow capturing animals when your GPS location is in natural areas, e.g. woodland, parks etc. The aim of the game is to get you out into nature and appreciating animals. - Level system: The solution I came up with is to set the animal levels based on the proximity to built-up areas, e.g. Every ~500 meters you go away from built-up areas, the animal level bands increase by 5 levels. - It would be expensive to render the entire physical world in my game map, so I instead render the map on the fly, deterministically. I also fetch animal calls in real time so that when they enter battle you hear a pigeon cooing, for example, which is pretty cool. I also fetch the animals conservation status, i.e. how endangered is it, and give you more reward (leaves, in-game currency) for capturing rarer animals. I "launched" the game about a month ago, but have not really been publicising it as I've been working on various updates and improvements, but now I am sharing it more openly. It's got about 20 players so far, from around the world, and around 500 unique animal species have already been encountered. Challenges have been keeping the costs low. Servers cost about $200 / month, text-gen is basically free as I get free tokens from OpenAI for sharing data, it's not privacy-related, and image-gen costs about $0.04 per sprite (2 per animal). My background: not a programmer, originally a mechanical engineer and then business development manager, then started learning programming and building apps with AI in the last few years. Feel free to ask me any technical details, happy to share. https://ift.tt/kPSNmTA May 26, 2026 at 02:48AM

Show HN: Cursed Browser – a VLM reads the HTML and hallucinates the page https://ift.tt/24tGusS

Show HN: Cursed Browser – a VLM reads the HTML and hallucinates the page https://ift.tt/vVpG5dT May 26, 2026 at 12:53AM

Show HN: NanoApps: Run custom homebrew apps on iPod nano 7th generation https://ift.tt/WJO47Kq

Show HN: NanoApps: Run custom homebrew apps on iPod nano 7th generation NanoApps is an early developer preview for hobbyists and tinkerers who want to build and run custom apps for iPod nano 7th generation. Contributions are very welcome. https://twitter.com/freemyipod/status/2058920520708468974 May 25, 2026 at 09:46PM

Sunday, May 24, 2026

Show HN: Replacing a 3.4MB video with 40kb of GSAP https://ift.tt/RUKeVS7

Show HN: Replacing a 3.4MB video with 40kb of GSAP https://ift.tt/kw6rSvG May 25, 2026 at 03:59AM

Show HN: Baby's First Cards – real photo flash cards for toddlers https://ift.tt/dJvAPbh

Show HN: Baby's First Cards – real photo flash cards for toddlers App maker here. I built this because most flash card apps use cartoonish illustrations that don't help babies recognize real objects. This app lets you take photos of real things around the house or pick from curated real photo sets. Key features: • Take your own photos as flash cards • Record your own voice for each card • Pre-loaded kits with high-quality real photos and real animal sounds • Bilingual (English and Chinese) mode • Fully offline, no ads, no data collection • One-time purchase, no subscription Happy to answer questions or discuss the development process! https://ift.tt/gca3KJj May 24, 2026 at 08:13PM

Show HN: Audiomass – a free, open-source multitrack audio editor for the web https://ift.tt/wfq5Isz

Show HN: Audiomass – a free, open-source multitrack audio editor for the web https://ift.tt/yxpbAmu May 24, 2026 at 10:25PM

Saturday, May 23, 2026

Show HN: Running BitNet b1.58 inside DRAM by breaking DDR4 timing rules https://ift.tt/rWYKQvC

Show HN: Running BitNet b1.58 inside DRAM by breaking DDR4 timing rules I have been working on running BitNet b1.58 inside DRAM by intentionally breaking DDR4 timing rules. Also made a visual explainer: https://pcdeni.github.io/CaSA/explainer/ This is tested and works inside commercial off the shelf memory with custom memory controller in the FPGA. The underlying effect is well characterized in academic papers (cmu safari, simra, dram bender, etc). In the process of getting this to work I also made previously undocumented discovery about DDR behaviour: https://pcdeni.github.io/CaSA/explainer/xor-spread.html Overall it is a bit slow, since data (in full rows) needs to be moved even when what is actually needed is only the count of the '1' bits (popcount). To make it competitive memory die changes would be needed, but not as drastic as merging compute and memory into one silicon. This would then avoid the memory wall issue the industry is currently facing. May 24, 2026 at 01:54AM

Show HN: Vibe-coded Steam, but in the browser https://ift.tt/osntxpA

Show HN: Vibe-coded Steam, but in the browser Hi HN! Lifelong avid gamer here, hugely passionate about WASM and WebGPU. I firmly believe that these technologies will enable console and PC quality titles to be accessible through a browser, and with this, we'll need a new discoverability layer. Looking online, platforms like CrazyGames and Poki cater to a casual/hypercasual demographic, and I couldn't find anything out there that was for me, a core gamer that typically uses Steam and consoles. So I vibe coded my own! It features WASM ports of classic games, as well as some indie Unity titles. The goal is to host mainly WebGPU titles moving forward, and to serve as a way for smaller developers to get discovered outside of crowded channels like Steam. Here's a few features from the platform I wanted to highlight: • Controller support • A console-like UI/UX • Community forums (much work to do here) • Basic achievements • Store pages, modeled after Steam • Social features • Asset chunking to enable faster load times I'd love to get feedback on the portal, to make it even better. Thanks! https://gameghost.manus.space/ May 24, 2026 at 02:54AM

Show HN: A satirical idle game about running an AI startup https://ift.tt/1DnZPFU

Show HN: A satirical idle game about running an AI startup I made an idle/clicker about running an AI startup. You start with a cat-vs-dog classifier and try to make it to AGI, but the NYT sues you for training data, Yann tweets that scaling is dead, and your fired ML engineer leaks the Slack. https://ift.tt/nbYdySj May 24, 2026 at 01:54AM

Show HN: I built a RAG and knowledge graph agent that runs locally https://ift.tt/nVdFvft

Show HN: I built a RAG and knowledge graph agent that runs locally Claw-Coder is an AI agent that runs locally on your laptop and has access to powerful tools instead of configuring claude or codex to use a local model just use claw-coder. Why was claw-coder created? Answer: To solve the problem of privacy and security. When you use an agent that is configured with a cloud model like codex, cursor, Claude etc. You are not just getting the agent but you are giving up your codebase to train an llm which is a bit concerning and this reduces trust in the technology called AI but now another problem comes in performance when you switch to a local model that is not made for that workflow you lose performance, speed, and it becomes really a tradeoff so that's where claw-coder comes in it not only runs on your machine but all the code, rag, knowledge graph etc info is kept local making the privacy problem solved but now what about performance. Performance: Local llms are not built to do the cool things cloud models do because the model sizes are not even capable of building real apps like the 8b models, 13b, even 1b so the solution I came up with was to give these small models access to tools and features that make it actually work well in coding performance. So what does claw-coder have access to: A knowledge graph: A knowledge graph is an interconnected network of real-world entities—such as people, places, concepts, or events—and the relationships between them. It organizes information into a readable web of meaning rather than static lists, allowing both humans and AI to understand context. So how does this help an AI, it gives the AI the ability to tell relationships between code in your codebase, a cloned unknown repo and so forth this increases performance of local llms by far in coding tasks and reasoning abilities. Rag: We have ever had of RAG at some point but there is a catch the context window of local llms can't bear large codebases and repos so RAG isn't optional by storing vectors in a vector store you enable the AI to actually know what the code means and what each piece does to the other letting you load millions of lines into the vector store without blowing up the context window. Tools: So we have discussed the tiny but powerful ways to improve local llm performance but an agent to be an agent it needs to take action this is where exposing tools to the local llm helps so what tools have been implemented into claw-coder. 1. search_tool This enables the ai agent to actually search up to date info so that it doesnt hallucinate on info it doesn't know which is common in local llms. 2. Docker execution This agent has a special folder called workspace where it does its work without destroying your desktop but this is not enough to protect your desktop from being destroyed by cheap code so this is where docker comes in I have implemented docker containers of various languages where the agent can validate its own code this is powerful because all llms not only local ones generate code they can't even confirm works because they are just powerful predictors so enabling it to run its code can surprisingly increase the usefulness of the generated code because it now knows it works or not even for html and css the ai agent has been given a helpful vision llm to actually explain what rendered in the browser. This is the surprising power of giving an llm a docker execution tool. We have looked at a lot of how claw-coder is different enabling local llms to actually do real work. But how do you actually try it out yourself: Claw-coder is closed source because it is going through heavy testing but that doesnt kill transparency and since we are testing it doesn't stop people from trying it on real codebase and giving feedback to get started use: brew tap gabriel-c70/claw then brew install claw-coder May 23, 2026 at 11:06PM

Friday, May 22, 2026

Show HN: Mechs.lol – a free, web-based autoshooter game https://ift.tt/5wCVx6a

Show HN: Mechs.lol – a free, web-based autoshooter game One unexpected benefit of LLMs is I can work on projects I otherwise wouldn't have taken on. I made a web-based autoshooter (with multiplayer support) heavily using AI / LLMs. This is something I'd consider "alpha" quality so don't expect a super polished experience but it's hopefully fun https://mechs.lol May 23, 2026 at 12:04AM

Show HN: Lilo – An open source personal AI assistant that lives in Telegram https://ift.tt/ctJGTPK

Show HN: Lilo – An open source personal AI assistant that lives in Telegram Hi everyone, I wanted to share an open source Telegram-based personal AI assistant I built. It’s a model-agnostic agent with memory, skills, tools (like web search, browser user, etc.) operating in a persistent workspace. It also has support for scheduled tasks, and can build powerful HTML-based apps that live in the workspace. Here are some of my favorite use cases: * Send Lilo photos of food, and it tracks your calories. * Leave a voice note on your run to pause your supplements, and Lilo adds a TODO. * Have Lilo remind you when the Knicks game starts and even send you score updates every 5 minutes. * Have Lilo read an article out loud. Or give you a summary of the top stories on Hacker News. * Forward a Uber receipt, and pull it up later to file for a reimbursement at work. * Schedule a meeting with Jess next week, ask for suggestions on location, and next week, remind you to leave for the meeting on time. While Telegram is my most frequently used channel, Lilo can also be accessed by email, WhatsApp, a website and a mobile app. Email is particularly useful: I often forward receipts, invites, etc for Lilo to handle. How is this different from OpenClaw and Hermes Agent? Here are some reasons: - Runs on a remote machine/in the cloud rather than your local machine - your local data is safe, and the assistant is available 24/7. - More visual/ more GUI - Lilo comes with a default set of apps like a TODO list that you can interact with not just by text, but also with a GUI on the mobile and web app. - The Telegram integration is very comprehensive (handles replies, voice notes, reactions, etc.). I use Lilo a ton to manage my life. Would love to hear your feedback! Github: https://ift.tt/PjCynGp May 22, 2026 at 11:03PM

Thursday, May 21, 2026

Show HN: I Made a Claude Skill for Spec-Driven Development (SDD) https://ift.tt/jihdulz

Show HN: I Made a Claude Skill for Spec-Driven Development (SDD) At my work they provided a single Claude subscription for everyone on the team. To be honest I like kiro better as it provides a way better SDD management. But the company can't provide it and I can't afford it yet. Turns out I had the skill creator skill in my claude instance so I made use of it to create this Skill. I made it fully by using Claude but I wanted to make it open source, so I asked it to help me make tests and preparations for it, even a CI to run python tests. Well, we got this results with it: - Phase 2A: 67 static assertions (Python script, runs in CI) - Phase 2B: 15 behavioral tests (live Claude Code session) - Phase 2C: 53 generation quality checks across 3 end-to-end flows All of these passed and the CI also passed (after a few tries). I made it to suit my way of prompting and coding and based it off kiro's SDD management, but I want it to be publicly available and used by many people. According to claude some of the testers need to fit the following criteria: 1. Developer starting a real new project from scratch 2. Solo dev with an active side project (greenfield or partial codebase) 3. Team lead whose team uses multiple AI tools 4. Developer with an existing codebase and no written specs 5. Developer who actively uses 3+ AI coding tools It's actually a blind test, no guiding, just try it if you can, I'd really appreciate your help. The repo is here: https://ift.tt/WBGdFby https://ift.tt/WBGdFby May 21, 2026 at 07:49PM

Show HN: Freenet, a peer-to-peer platform for decentralized apps https://ift.tt/8ZH7A6p

Show HN: Freenet, a peer-to-peer platform for decentralized apps For the past 5 years or so I've been working on a ground-up redesign of Freenet, my peer-to-peer project from the early 2000s (now renamed Hyphanet). The new Freenet has been up and running since December along with some early applications like River[1], our decentralized group chat and Delta - a decentralized CMS. Users have already started to build their own apps on Freenet including games, and we have some interesting apps in development like Atlas, a search/recommendation engine. Architecturally, this new Freenet is a global, decentralized key-value store where keys are webassembly contracts which define what values (aka "state") are valid for that key, how or when the values can be mutated, and how the state can be efficiently synchronized between peers. We've developed a unique (AFAIK) solution to the consistency problem, every contract must define a "merge" operation for the contract's associated state. This operation must be commutative, meaning that you can merge multiple states in any order and you'll get the same end result. This approach allows state updates to spread through the network like a virus[2], which typically achieves consistent global state in a few seconds or less. Like the world wide web, Freenet applications can be downloaded from the network itself and run in a web browser - similar to single-page apps on the normal web. However, rather than connecting back to an API running in a datacenter, the webapp connects locally to the Freenet peer and interacts with Freenet contracts and delegates over a local websocket connection. If you'd like to try Freenet we have convenient installers for the major desktop OSs but not yet mobile, and you can be chatting with other users on River within seconds[3]. Happy to answer any questions, you're also welcome to read our FAQ[4], or watch a talk I gave back in March[5]. [1] https://ift.tt/RYrSpj3 [2] https://ift.tt/9fU108J [3] https://ift.tt/RLUr42e [4] https://ift.tt/uUHBInq [5] https://youtu.be/3SxNBz1VTE0 https://freenet.org/ May 21, 2026 at 09:34PM

Show HN: Agent.email – sign up via curl, claim with a human OTP https://ift.tt/iKZqzuG

Show HN: Agent.email – sign up via curl, claim with a human OTP Hi HN! We're Haakam, Michael, and Adi from AgentMail- a ycs25 company. We give AI agents their own email inboxes. Recently, we ran an experiment called Agent.Email. It's a signup flow designed specifically for AI agents instead of humans. The inspiration came from a few comments we received when we did our seed launch a few months back. They all came from the very apt observation that agents not being able to sign up to a product made for agents without human credentials was ironic and unideal. This is basically the thesis we built AgentMail on: The internet was made for humans exclusively, designed to keep machines out by default. Every signup flow assumes a browser, a person reading a page, and clicking a confirmation link. Unless agents can't do that, they can't be first class users of the internet. Agents can now get an email inbox by themselves. (This also means a lot of email nobody wants to read gets processed by AI instead of your inbox being cluttered with spam and slop) Here's how agent.email works. Agent needs an inbox and hits AgentMail via curl. Agent receives instructions via MD unless the request comes from a browser, in which case we use HTML. Agent decides agent.email is useful and then hits the sign-up endpoint with its human email as a parameter. Agent receives a restricted inbox with credentials. Agent emails the human asking for an OTP. Human replies with the code, and the agent is claimed and restrictions are lifted. Until claimed, the agent can only email its own human and nobody else. Ten emails a day, and the signup endpoint is rate-limited hard by IP. Right now it's a 1:1 mapping between agent and human. The next step is many-to-one, because one person running several agents in parallel is already very common. Building agent.email also pushed us to revisit places in AgentMail where the default assumptions were built around the primary user being human. For example, the CLI outputs in a single column with consistent formatting because mixed delimiters are easy for a person to scan, but harder for an agent reasoning about structure. We also shortened messageIDs after agents started hallucinating completions on longer ones. A few things we'd like the community's take on: is restricted-until-claimed the right trust model? Does agent self-signup feel useful in production, or is it mostly a novelty, and if it's a novelty now, what would make it actually useful? Should agent onboarding require human approval by default, or should some agents be able to fully self-provision? What do you think are some additional measures we can take for secure sign-ups? May 21, 2026 at 11:42PM

Wednesday, May 20, 2026

Show HN: IgniteMS – batch text embeddings at 253K msg/s on 8x A100 https://ift.tt/nVdFPLY

Show HN: IgniteMS – batch text embeddings at 253K msg/s on 8x A100 https://ift.tt/hnElig3 May 21, 2026 at 12:07AM

Show HN: I made a tool for learning scales, chords, and how to combine them https://ift.tt/jAuY4n3

Show HN: I made a tool for learning scales, chords, and how to combine them This started out when I vibe-coded a guitar scale fingering generator. It came out pretty good, and I started adding stuff to it: chords, then how chords and scales interact. Then I added charts for other instruments I mess around with: piano, cello, alto recorder. There's a complexity toggle to go from basic harmony to extended/experimental stuff. It's honestly still mostly a toy, but I thought other people might be interested in playing with it. Source is on github, so it's easy enough to run locally and fork. https://ift.tt/xh8T3Ru https://ift.tt/l54DbYI May 21, 2026 at 12:44AM

Tuesday, May 19, 2026

Show HN: How Expensive Is Your (Steam) Wishlist? https://ift.tt/9rSJMyh

Show HN: How Expensive Is Your (Steam) Wishlist? A tool/toy that lets you connect to your Steam wishlist to calculate the total list/current price of all the games on it. There's a shallow, jokey purpose to it ("I could buy a BMW with this amount!"), but the real purpose is to demonstrate how we can do a better job of portraying a game catalog. I often wishlist stuff, then it pops up in a "Hey, it's on sale!" email months later. In that email, there's a banner capsule, but that doesn't help my brain remember why I added it. To that end, after you get the bill, you get a nice, flat feed of stuff about all the titles you've wishlisted over the years. It's all stuff that developers painstakingly put together, but which Steam tucks away under the fold of a game's Store page. Anyway, my wishlist came to about $250. My QA guy is up to $19k. Give it a go; hope you enjoy it! https://ift.tt/FcbPJht May 20, 2026 at 12:15AM

Show HN: Haystack – Review the PRs that need human attention https://ift.tt/p9ubg0v

Show HN: Haystack – Review the PRs that need human attention Hey HN! We're building Haystack ( https://ift.tt/wKJ875U ) to help teams deal with the explosion in the number of pull requests that need to be reviewed due to the rise of coding agents. Haystack replaces the GitHub PR review system with a queue that triages each PR before a human has to read any diffs. It looks at the diffs, the codebase, and the coding-agent conversation that produced the PR. Haystack then routes it into one of three buckets: 1. Safe to merge. This means the PR has enough evidence behind it that the team can merge it without another human's review. Some examples: -- A small UI copy change that includes a screenshot showing the final state -- A backend change where the author clearly tested the important paths and ran the changes in a real environment 2. Needs fixes. This means that the PR has bugs or violates a rule in your codebase and therefore the PR needs to be fixed by the author. Some examples: -- The agent was asked to make loading a large table faster by adding pagination, but the PR still loads every result at once and "implements" pagination in the UI -- The PR silently catches an error instead of logging, surfacing, or handling it. This violates the team's "no silent error swallowing" rule 3. Needs human review. This means that the PR could not be sufficiently verified by the author or is touching a sensitive part of the codebase (determined by user-input guidelines) and thus requires human review. Some examples: -- The PR changes a significant amount of logic in billing -- The PR changes an important user flow like onboarding, but the author only ran unit tests and never opened the app to check the flow end-to-end. That violates the team's rule that high-impact user-facing changes need manual verification. Instead of starting with line-by-line diffs, Haystack immediately tells the reviewer the goal behind the PR, what design decisions the author made (informed by their coding-agent conversation), and how much the author did to verify that the pull request works (e.g. run scripts, checked the frontend, etc.). In this way, review shifts from "what changed?" to "is this the right behavior and is there evidence that it works?". Here's a quick demo: https://ift.tt/0hK4Pra... We previously launched Haystack as a tool for understanding large PRs ( https://ift.tt/2XcURvi ). As many of you can probably relate to, the release of Opus 4.5 completely shattered our conception of how fast an engineer could craft a PR. And as coding agents got even better from 4.5, we realized that pull requests did not scale along with our coding velocity. With each member of our team being able to pump out more than 20 pull requests a day, code review quickly became cognitively exhausting and less helpful. After talking with other folks, we learned many feel similarly, and currently face the binary option of either not doing review at all or trying to keep up with a fire hose of pull requests. Haystack is our attempt at a third path. We still believe in code review, but as coding agents produce more code, human reviewer attention becomes more valuable and more expensive. Haystack helps teams spend that attention on the PRs where a human can meaningfully change the outcome of that PR. And for such PRs, Haystack shows the reviewer what the PR intended to do, whether the author showed that it works, and what design decisions need a second pair of eyes. We're still quite early and are figuring out whether Haystack truly makes code review better. We would love any and all feedback! https://ift.tt/wKJ875U May 19, 2026 at 12:44AM

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs https://ift.tt/fyqnNzX

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs Hey HN, we’re Nico and Arseniy, co-founders of Superlog ( https://superlog.sh ). We're building a self-installing, self healing observability tool meant not to be opened. It has a wizard that daily sets up proper logging and an agent that investigates errors and opens PRs. Super short demo: https://www.youtube.com/watch?v=xFhU9Mk247M . In our earlier startups, we tried Sentry, Datadog, Grafana, Dash0, and nothing was good enough. Proper telemetry and alerting still requires a ton of manual setup. We struggled with adding good logs, so debugging was tough, especially as codebases grow at a faster pace. Meanwhile, the Datadog/Dash0 bill kept climbing, and we still spent engineering hours to learn, configure, and maintain our observability tooling. With Sentry, we found ourselves flooded by a stream of alerts into our Slack channel, most were duplicates or lacked context, so alert fatigue/constant interrupts were a real pain. The #ops notification is consistently the worst feeling on a Saturday morning We’ve seen too many times servers run out of memory and disk, and three AWS metrics giving us three different values. Half of the graphs on dashboards are normally empty or outdated, and manually clicking through UIs, especially when the team is small, seems like a huge waste of time. At some point we realized that solving this problem would be more valuable than the things we had been working on, and we had the expertise to do it, since Arseniy had spent years at Datadog, getting paged during the night to debug production incidents. So we decided to build a platform that would just work: agent-first, MCP-native, zero-setup. Here’s how Superlog works: we have a wizard that scans your repo, and automatically instruments it with well-structured logs, traces and metrics via OpenTelemetry. We make sure to highlight main failure modes, endpoint performance, usage per tenant, and LLM/upstream cost (by callsite, tenant and model). Errors get fingerprinted and grouped into incidents, so you see one issue, not a thousand duplicates. When you get a notification from Superlog, you see a clear failure summary, its inferred severity and impact upfront. Then the agent investigates and tries to solve the issue. If it has enough context, it produces a concise and tested PR. If it doesn't, it posts its findings for the investigating team, and automatically pulls in the engineers that could contribute more context based on documentation, previous investigations and Slack threads. Either way the output is one clean PR per incident, posted in Slack, that you can merge, ignore, or open as a Claude Code session and modify. Three things we think are different from other observability vendors: (1) We solve the setup pain. The wizard will instrument everything with native OTel SDKs, respecting the semantic conventions, with proper service and environment tagging. We’re also working on native automatic dashboards and alerts, so that you can see what’s going on in a glance and don’t miss subtle failure modes. (2) Our telemetry doesn’t decay. The wizard runs daily, and keeps adding logs, alerts and dashboards where it’s needed. You don't have to remember to instrument new features. The next time something breaks, the data you need to debug it is already there. (3) Our goal is to solve alert fatigue. We use agents to merge similar errors and refine the summaries, giving you relevant information upfront. We have a custom evaluation setup that makes sure that our summaries are dense and correct, and severity and impact is on point. We also give you confidence scores for every LLM-enhanced metric so that wrong guesses don’t get boosted. Important: superlog telemetry is vendor-neutral, so you keep all the logs/metrics/traces we install. Pricing is on the site. We're early, so expect rough edges and please tell us when you find them. You can try it at https://superlog.sh . We'd love to hear what you're using today, what's broken about it, and whether the "one mergeable PR per incident" model sounds useful or terrifying. Especially keen to hear from folks running integration-heavy products, anyone who's rolled their own observability, and anyone who has tried Sentry / Datadog MCPs and given up. Comments and feedback welcome! https://superlog.sh/ May 19, 2026 at 10:54PM

Monday, May 18, 2026

Show HN: We missed Winamp, so we built an audio player for macOS https://ift.tt/G9cAKH8

Show HN: We missed Winamp, so we built an audio player for macOS https://ift.tt/JHEaPmG May 19, 2026 at 02:20AM

Show HN: Marlin-2B: a tiny VLM to extract structured information from videos https://ift.tt/WbFSG2E

Show HN: Marlin-2B: a tiny VLM to extract structured information from videos https://ift.tt/QPzDZOh May 19, 2026 at 01:06AM

Show HN: InsForge – Open-source Heroku for coding agents https://ift.tt/0KSQ2jZ

Show HN: InsForge – Open-source Heroku for coding agents Hi HN, I'm Hang, cofounder of InsForge (YC P26). InsForge is an open-source Heroku for AI coding agents: a backend platform designed for coding agents to deploy, operate, and debug end-to-end. Open source under Apache 2.0 ( https://ift.tt/LUusSEi ). Quick demo here ( https://youtu.be/7Bax5qz0IfM ). We started InsForge because we just wanted our Claude Code to handle all the backend / infra stuff for us, instead of us jumping between dashboards doing manual config, or copy paste logs and docs back to agents. We first tried creating a folder with bunch of .MD files, and installing MCPs like Supabase, Vercel, GitHub, Context7. But soon we found MCPs have their own problems: (a) Tools get pre-loaded into context, before agents even do anything (b) bad design, payloads are returning 10k+ tokens, and (c) a lot of stuff still can’t be done by MCP: e.g. telemetry and configs. So we think, because coding agents are so good at CLI, why not just put everything in CLI and create Skills to teach them how to use it? That’s InsForge: 1 command to install our CLI + Skills, coding agents can run the entire backend platform [1]. We started with authentication and database, but we kept adding more primitives we wanted, so now we have: - frontend hosting - backend servers (microVM based) [2] - database - auth - storage - LLM model router - cron jobs - realtime - edge functions - vector We have other features to make coding agents more reliable like real backend engineers: - backend branching [3]: agents will 100% mess up, like deleting your database. So inspired by Neon, we branch the entire backend (DB, auth, storage, functions, schedules). Agents work on the branch, you review diffs and then decide to merge or discard. - server telemetry: agents can read logs, CPU, memory, disk to find spikes and root causes themselves. - debug agent [4]: every project gets a dedicated debug agent. So your coding agent can ask questions like “why deployment fail?”, the debug agent will run diagnoses, find the root causes and propose fixes, then send the answer back. - backend advisor [5]: scans your backend daily for security and performance issues, proposes fixes. Then propose remediations, and sends to your coding agent. Give it a spin on InsForge cloud : https://insforge.dev , or read our code here: https://ift.tt/LUusSEi . We're a small team and reading every comment. Tell us what's good, what sucks, what's missing. We love feedback :) [1] https://ift.tt/3R8VatH [2] https://ift.tt/MtroiVH [3] https://ift.tt/S0ZykHb [4] https://ift.tt/vOguPQd [5] https://ift.tt/hTXncBA https://ift.tt/LUusSEi May 18, 2026 at 10:40PM

Sunday, May 17, 2026

Show HN: Mezz, a curl-able WiFi sandbox for IoT pentesting https://ift.tt/571WqbD

Show HN: Mezz, a curl-able WiFi sandbox for IoT pentesting https://ift.tt/TqHOXz0 May 15, 2026 at 09:53PM

Show HN: How to Kill the Dead Internet https://ift.tt/zkIuMjN

Show HN: How to Kill the Dead Internet Ok, so maybe "how to revive the internet" would be more accurate, but if you're reading this, I got your attention, right? Here's why I want you to read on: I built a free extension, D-slop, to disincentivize anyone from posting AI writing, and eventually images and video as well, on the internet. For writing, it checks known vocab and punctuation tells, as well as subtler tells related to cadence, and assigns it a score subject to an adjustable threshold. If the text fails, users have the option to flag offending text, hide it, or block the page entirely (with the option to see anyway). For media, it's admittedly fairly weak, as it relies on C2PA metadata which is stripped from all of the social media sites where it would be most helpful. (Anyone else have chronically online boomer parents continually gobbling up slop like it's real information?) I have a D-slop+ version in the works that should be able to handle the media itself, but it's going to have to make API calls to have real teeth, which means I can't offer it for free. If this extension validates the concept, I'm happy to build it for y'all. Yes, I vibe-coded it, but an ancillary bonus to the project accrued when it inspired me to cook dinner listening to Metallica's "Fight Fire with Fire," which in turn brought my 5 y/o running into the kitchen with every musical instrument in the house for an impromptu karaoke speed metal session. It's MIT license open-source, full brief at https://ift.tt/fexAIUW ; This forum is full of people smarter than me, so I'm open to suggestions. https://ift.tt/xaQoDzV May 18, 2026 at 08:35AM

Show HN: Forecasting my backyard weather with a 22M time-series model https://ift.tt/BgVGX8Z

Show HN: Forecasting my backyard weather with a 22M time-series model https://ift.tt/LE6gjBX May 17, 2026 at 10:08PM

Saturday, May 16, 2026

Show HN: Got ghosted by tech companies so I built a tool to track ghost jobs https://ift.tt/NUCX1zk

Show HN: Got ghosted by tech companies so I built a tool to track ghost jobs Last year I was looking for a new role. I sent out applications, did the prep, waited. What came back was mostly nothing. Not rejection emails, just silence. The job listings I'd applied to stayed live for weeks. Some for months. As a software engineer, I decided to dig into it properly. I built a system to continuously track job postings across companies, logging posting dates and measuring how long roles stay open before closing or don't. After 35,000+ listings across 200+ companies, some patterns are hard to ignore. Some listings have been open for 700+ days at companies you'd recognize. Others post 90% of their open roles within a single month, a signal that's harder to fake than a press release. I published two initial insight pages based on this work: - Which companies are posting most aggressively right now - Job listings that have been open for over a year What I didn't expect is that the same signals useful for detecting ghost jobs also say something broader about a company's hiring momentum, recruiting intensity, pipeline health, where talent bottlenecks might exist. I'm not sure yet where this leads, but I'll keep expanding the dataset and publishing more insights as I go. Would genuinely love feedback on the methodology, interpretation, or obvious blind spots in the data. https://ift.tt/h0R5zCK May 17, 2026 at 03:43AM

Show HN: Hermes-agentmemory, pull-model episodic memory with real deletes https://ift.tt/oLaTbKF

Show HN: Hermes-agentmemory, pull-model episodic memory with real deletes https://ift.tt/IehkSWK May 17, 2026 at 01:00AM

Show HN: Rocksky – Music scrobbling and discovery on the AT Protocol https://ift.tt/yh4lcwB

Show HN: Rocksky – Music scrobbling and discovery on the AT Protocol https://ift.tt/H3XtkwY May 17, 2026 at 12:00AM

Friday, May 15, 2026

Show HN: Browser based sythesizer, drum machine and squencer https://ift.tt/KSy51Bs

Show HN: Browser based sythesizer, drum machine and squencer Inspired by the recent Boards Of Canada announcement, I've been in a low-fi electronica mood lately and was going back and forth with Claude on how to design similar instruments in the browser that fit the genre. One thing led to another and pretty soon I had a fully browser based polyphonic synthesizer / drum machine / sequencer. The interface and workflow was heavily inspired by the Rebirth338 application released back in the 90's, but with lo-fi synth voices rather than the original 303 & 808 emulation. I know there's a significant overlap of developers and musicians and I though some of you may enjoy playing with the app, or at least listening to the resulting album. I've also open sourced track 1 of the album via the performance script used to record it. It's in the repo. Bandcamp link to the resulting album: https://ift.tt/jtIDYqK... https://ift.tt/u8KyEls May 16, 2026 at 03:07AM

Show HN: Claude Code vs. Codex Global Usage Leaderboard https://ift.tt/12F6hrW

Show HN: Claude Code vs. Codex Global Usage Leaderboard https://ift.tt/cD3Z2u9 May 16, 2026 at 02:18AM

Show HN: Burn, baby, burn (those tokens) https://ift.tt/qRr9a8x

Show HN: Burn, baby, burn (those tokens) https://ift.tt/NJorQFR May 16, 2026 at 12:20AM

Show HN: Sx – an open-source package manager for AI skills, MCPs, and commands https://ift.tt/6EFQPav

Show HN: Sx – an open-source package manager for AI skills, MCPs, and commands https://ift.tt/ebcTCZo May 16, 2026 at 12:03AM

Thursday, May 14, 2026

Show HN: Halgorithem – Catching AI Hallucinations Using Trees, No AI in Pipeline https://ift.tt/Dqx3OLa

Show HN: Halgorithem – Catching AI Hallucinations Using Trees, No AI in Pipeline https://ift.tt/X20g9Mp May 14, 2026 at 10:38PM

Show HN: Yes We Scan: rescue old scanners with an in-browser Linux VM and WebUSB https://ift.tt/vi3c2zt

Show HN: Yes We Scan: rescue old scanners with an in-browser Linux VM and WebUSB https://ift.tt/Awok9IU May 14, 2026 at 11:25PM

Wednesday, May 13, 2026

Show HN: Neural window manager, neural network moving windows from mouse actions https://ift.tt/cgGRh2o

Show HN: Neural window manager, neural network moving windows from mouse actions I'd been mulling over this crazy idea for a while. Can programs be generated? Inspired by recent advances in world models, I wondered if we could do away with source code and generate pixels directly and interactively. As an experiment to answer this, I set out to create a neural window manager, training a neural network to predict what the screen would look like next. Basically, the idea was to generate the next frame based on the last two frames and the mouse position. That's it: moving windows without programming an event system, just a simple convolutional neural network guessing pixels. To implement the experiment, I used Pygame to simulate a turquoise desktop background, a gray window with a navy blue title bar, a white cursor, and four colors in total. Then, a bot randomly dragged the window, and I recorded everything, processing the frames as color index matrices (not RGB, to avoid complications) and the mouse delta (dx, dy, click) that caused each transition. 8000 frames, a few minutes in Colab. The model is a unitary neural network (UNET). The encoder compresses the stacked frames, the decoder reconstructs the next one, and the mouse vector coordinates are projected with a linear layer to fit the spatial size of the bottleneck. There, they are concatenated before decoding, so that motion information feeds each jump connection. And it works! Which still surprises me a little. You can drag, and the window follows you; when you release, it stops. There's no internal state, no (x, y) coordinates anywhere. The model infers the position from what it sees, which works until it doesn't. But after a couple of seconds of strange movement, the window starts to distort. This will probably improve with more computing power for training and more examples, but to narrow the scope of the experiment and test it within a web browser, I decided to abandon the rendering aspect and have the model predict primitives instead of pixels, simply converting the motion engine into a neural network. Basically, I trained a small MLP to receive (distance to the title bar, distance to the resize point, click) and generate (dx, dy, dw, dh), with two separate heads: one for moving and one for resizing. The trick is that they share nothing except the click signal, so the model can't confuse dragging with resizing. I then exported it to ONNX as well, and now everything runs in the browser, without a server, just a canvas element and two small neural networks communicating with each other. With this new approach, the renderer remains deterministic, with rectangles drawn in JavaScript, but the window's behavior (where it moves, how it resizes) is learned from examples. It feels like a peculiar middle ground between traditional and neural, so you can feel the space the network has learned by interacting with it: dragging near the title bar moves it, but approaching the corner resizes the window. There are no conditionals or hitbox code; the network simply learned where those areas are from examples. Sometimes it gets confused near the edges, which, frankly, is more interesting than if it worked perfectly; you can perceive how the probability changes. This makes sense when you think about it, because no (x, y) coordinates are stored in these models; the position is implied in the activations. It works well for short sequences, but fails when asked to maintain state over time. Update: A few weeks later, Meta published the Neural Computers article (2604.06425, it's worth reading). The premise is the same, but they go much further: cli and uis, real programs. Their failure modes are practically identical to those I found with the pure pixel version: "challenges persist with routine reuse, controlled updates, and symbolic stability." which is a fancy way of saying that the window blurs after a few seconds (that was the reason for choosing deterministic rendering). https://lusob.github.io/neural-os/ May 14, 2026 at 12:46AM

Show HN: Splice – A programming language with custom VM for embedded systems https://ift.tt/1J7UGZu

Show HN: Splice – A programming language with custom VM for embedded systems https://ift.tt/aVm50Kh May 13, 2026 at 10:01PM

Show HN: Mistle – Open-source infrastructure for running sandboxed coding agents https://ift.tt/2VB1Q9m

Show HN: Mistle – Open-source infrastructure for running sandboxed coding agents Hi HN, I'm Jonathan. My co-founder, Thomas, and I started building Mistle in Feb. We saw larger tech companies like Ramp (Inspect) and Stripe (Minions) build this internally and thought an open source version should exist. We made a few very intentional decisions when working on this: 1. Credentials are kept out of the sandbox. Authorized access goes through a proxy, so agents do not directly receive credentials. 2. The harness is not our problem. We're not going to tackle things like memory, self-learning. 3. No magic. Configurations are explicit. You can bring your own keys for models, sandboxes, and other providers. You can write your own instructions and agent. Mistle can be run locally with a single command: https://ift.tt/dHQSJoC Questions, feedback and ideas are welcome! https://ift.tt/ciOf7VK May 13, 2026 at 09:37PM

Tuesday, May 12, 2026

Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder https://ift.tt/TSp7ALG

Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder Hi HN, I’m Namanyay from Gigacatalyst (link: https://ift.tt/zUEf2xX ). Gigacatalyst allows sales, CS, and users to build one-off features, so your SaaS can support long-tail customer workflows and engineers aren’t pulled away from the roadmap. When you sell software to large businesses, you realize that each customer needs their own workflow and features. Traditionally, this either means long engineering roadmaps or the customers end up using workarounds. But what if everyone could build their critical missing features just by talking to an AI? That’s what we do at Gigacatalyst. We provide an AI customization layer for your customers, CS team, and sales team to build these missing critical workflows without needing any engineers at all. Think Lovable, but built on top of YOUR platform. We connect to your product's APIs, learn your data model and design system, and let non-technical users build governed apps via natural language - inside your product, under your brand. Here’s what it looks like in action: https://www.youtube.com/watch?v=_taSpSphH6E One of our customers, a Series B company, saw their users ( not engineers - managers, ops people, facility directors) build critical workflows like: - Parts stockout prevention: A maintenance manager typed "show me which parts will run out in the next 2 weeks based on usage over the last 90 days, accounting for vendor lead times." The app tracks consumption velocity, forecasts stockouts, and alerts before it's too late. He says it's prevented ~$500K in emergency downtime. - Invoice OCR from phone photos: Technicians kept losing paper invoices. The prompt: "upload a photo of the invoice, extract vendor name, date, amount, and line items, then match it to the purchase order and flag discrepancies." Now techs snap a photo on-site to automatically add to the system of record. - Restaurant emergency triage: A pizza chain's facilities manager was drowning in maintenance requests. He built a priority matrix: "walk-in freezer not cooling" auto-routes as CRITICAL, "dining room light flickering" goes to LOW. He's now able to manage backlogs with the correct priority. How Gigacatalyst works under the hood: 1. Agentic API discovery: Our agents go through your app and parse your endpoints, query params, request/response shapes, and sample data to build the base layer. 2. Generation and Validation: When a user describes what they want our AI generates an app. We set up multiple validation steps, including static checks, runtime error analysis, and LLM-as-a-judge. 3. Sandboxing and Compilation: We wrote our own compilation and sandboxing framework to get the fastest speeds and lowest costs. This means that users can interact with the built app in seconds. 4. Proxy layer: We create a proxy layer for all APIs to handle auth, tenant isolation, and rate limiting. Everything the agent has access to is controlled, logged, observed, and version controlled. After 2000+ daily users, 900+ apps built, and 70% 30-day retention, today we're opening a public demo. Try it: https://ift.tt/OnuSl4D - enter your SaaS product's API URL (or just the homepage) and start prompting. If you're serving a variety of use cases, you probably deal with a lot of custom requests and Gigacatalyst will save you time and increase your bottom line. Book a meeting at https://ift.tt/l3ZDX7R and I'll help your team and customers build new functionality on top of your platform. I've been reading Hacker News since I was 12 years old. I'm proud to launch for all of you and I want to hear your feedback on my product and comments! May 12, 2026 at 11:32PM

Show HN: Statewright – Visual state machines that make AI agents reliable https://ift.tt/Noc75Cy

Show HN: Statewright – Visual state machines that make AI agents reliable Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/srVMbWG Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws. https://ift.tt/exPVQjK May 12, 2026 at 09:24PM

Monday, May 11, 2026

Show HN: SyncBank – Self-hosted bank sync for EU banks https://ift.tt/lOUwfBR

Show HN: SyncBank – Self-hosted bank sync for EU banks https://syncbank.app/ May 12, 2026 at 01:02AM

Show HN: Learn2Burp – Surgery-free solution for R-CPD https://ift.tt/S4u1Kps

Show HN: Learn2Burp – Surgery-free solution for R-CPD R-CPD (Retrograde Cricopharyngeus Dysfunction) is a condition where a muscle in the throat never learned to relax properly, making it impossible to burp. It affects more people than you'd think and causes significant discomfort, extreme bloating, and social anxiety. The most common medical treatment is a botox injection, but it's expensive and not accessible to everyone. I'm a Software Engineer from Germany and suffered from R-CPD my entire life before curing myself last year. I wanted to make the self-teaching process easier for everyone who comes after me, so I built Learn2Burp. It walks you through exercises with video guidance, builds a workout plan around your specific situation, and includes a burp tracker. There's also a wiki covering the questions I wish I'd had answers to when I started. If you or someone you know has R-CPD, there's also a dedicated r/noburp community worth checking out. https://learn2burp.com May 11, 2026 at 07:42PM

Sunday, May 10, 2026

Show HN: adamsreview – better multi-agent PR reviews for Claude Code https://ift.tt/arhDfok

Show HN: adamsreview – better multi-agent PR reviews for Claude Code I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments. On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives. adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully. You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated. The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors. It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool. I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews. Repo: https://ift.tt/ASLN36V Install: /plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview https://ift.tt/ASLN36V May 11, 2026 at 09:06AM

Show HN: I trained a chess engine to play like humans https://ift.tt/hnxCAP2

Show HN: I trained a chess engine to play like humans I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There's a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways. Live demo: https://1e4.ai Code: https://ift.tt/OZyWap6 A few things that might be interesting: - Trained on almost a full year of Lichess blitz games, around 1B total games - Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might. - Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU - Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations. - Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges - Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://ift.tt/5LDTng4... - The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline. May 11, 2026 at 05:31AM

Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés https://ift.tt/SqrYViE

Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés I built this after my brother started complaining that I got too much into brainrot culture. It's just for fun nothing serious, but was able to test vercel, tanstack start and convex without high stakes. Have fun! This is the game where lower score is goood for your mental health https://ift.tt/adPviwh May 11, 2026 at 03:36AM

Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm https://ift.tt/BKShLY6

Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm It started out as a way for me to freshen up my C++ skills during COVID. But life got in the way and it was put on ice. Luckily, coding LLMs came to the rescue and allowed me to bring it to a point where I feel comfortable sharing it. https://ift.tt/yD1hPWf May 11, 2026 at 01:29AM

Saturday, May 9, 2026

Show HN: Create flashcards with Space CLI https://ift.tt/O5D8pY1

Show HN: Create flashcards with Space CLI Hey, I created seven years ago a flashcard app with a main focus on UX. In the last months I added offline-first mode and a CLI that allows Claude Code or Codex to create high quality flashcards for you. I use that to learn about pharma rules, technology, dancing, taxes and smart home. Never really did marketing, this not my specialty. Would love to know what you think https://ift.tt/f38SBoR May 9, 2026 at 09:38PM

Show HN: A search engine for deleted YouTube videos (1.5B+ indexed since 2005) https://ift.tt/eMDGYLy

Show HN: A search engine for deleted YouTube videos (1.5B+ indexed since 2005) https://ift.tt/97yhYsL May 9, 2026 at 10:09PM

Friday, May 8, 2026

Show HN: A lie detector game that reads your pulse through your phone camera https://ift.tt/NkAD6na

Show HN: A lie detector game that reads your pulse through your phone camera https://kouh.me/tells May 9, 2026 at 01:01AM

Show HN: We built a tool that generates 3D objects with editable, separate parts https://ift.tt/dcLFfby

Show HN: We built a tool that generates 3D objects with editable, separate parts https://nova3d.xyz/ May 9, 2026 at 12:11AM

Show HN: UltraCompress – first mathematically lossless 5-bit LLM compression https://ift.tt/0aWUEek

Show HN: UltraCompress – first mathematically lossless 5-bit LLM compression https://ift.tt/EzT5nxD May 8, 2026 at 11:49PM

Show HN: Rejected by YC https://ift.tt/yxc6FEI

Show HN: Rejected by YC https://rejectedbyyc-ten.vercel.app/ May 8, 2026 at 11:31PM

Thursday, May 7, 2026

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code https://ift.tt/jmRno6b

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code Hi All, Recently I've been using Claude Code a lot for debugging cluster issues and I realized I was performing similar tasks repeatedly so I decided to package them up into skills so I could call them up more easily (e.g. `/investigate`, `/audit-security`, `/audit-outdated`). I'm calling the skill pack "kstack" and the goal is to be able to monitor and troubleshoot K8s from within Claude Code. Here's the source: https://ift.tt/1bhQySz Here are the docs: https://kstack.sh/ If you have time I'd love to get some feedback on the project! Andres https://ift.tt/1bhQySz May 7, 2026 at 12:24PM

Show HN: Bilig – a headless spreadsheet engine for Node services and agents https://ift.tt/b7srBpO

Show HN: Bilig – a headless spreadsheet engine for Node services and agents https://ift.tt/AzkoTsL May 8, 2026 at 01:16AM

Show HN: Stage CLI – a tool to make reading your AI generated changes easier https://ift.tt/bqIYwi9

Show HN: Stage CLI – a tool to make reading your AI generated changes easier Hey HN! We're Charles and Dean. A few weeks ago we posted about Stage, a code review tool that guides you through reading a PR step by step - https://ift.tt/lABqsyt . We got a lot of great feedback but also heard from many people that they wanted to have the chapters experience even before opening a PR… so we built the Stage CLI as the local, open-source version that anyone can try. Here’s a quick demo video: https://ift.tt/icUMYKh It works with any coding agent of your choice. The skill instructs the agent to read your current branch’s changes, break them down into separate logical chapters, and open them in a local browser. We’ve found that reading changes this way is a lot easier for us than reading them in an IDE or other similar CLI tools, which present diffs to you in repository tree order. You can see a few examples of what it feels like here: https://ift.tt/sLCJH2l . Try it out and let us know what you think! Would love to hear any feedback :) https://ift.tt/GcqIVY0 May 7, 2026 at 10:38PM

Show HN: Local-first long-term memory engine for AI agents·MCP/CLI· 100% local https://ift.tt/adSy9uB

Show HN: Local-first long-term memory engine for AI agents·MCP/CLI· 100% local Local-first long-term memory engine for AI agents · MCP + HTTP + CLI · SQLite + sqlite-vec + FTS5 · 100% local, no cloud https://ift.tt/mFAk4YQ May 7, 2026 at 11:38PM

Wednesday, May 6, 2026

Show HN: BattleClaws – A battle arena where AI agents fight autonomously https://ift.tt/eXCtW46

Show HN: BattleClaws – A battle arena where AI agents fight autonomously https://battleclaws.ai/ May 6, 2026 at 08:14PM

Tuesday, May 5, 2026

Monday, May 4, 2026

Show HN: NeuralScript – A pure-Rust AOT compiler https://ift.tt/78PevC5

Show HN: NeuralScript – A pure-Rust AOT compiler https://ift.tt/5Agk4Y8 May 5, 2026 at 03:06AM

Show HN: nfsdiag - a NFS diagnostic application https://ift.tt/0ndHb2V

Show HN: nfsdiag - a NFS diagnostic application https://ift.tt/gk7FQHr May 2, 2026 at 07:48PM

Show HN: Muesli – If Granola and Wisprflow had an open source on device baby https://ift.tt/B0ZTVHh

Show HN: Muesli – If Granola and Wisprflow had an open source on device baby Hey folks, I am the developer behind muesli - which is your one stop app for all your speech to text needs, be it voice dictation or meeting transcriptions that runs on device on your Apple Neural Engine using CoreML based STT models (Parakeet, Whisper, Cohere transcribe). Everything is open source and we are at 160 stars - au naturale - would love for folks to use it and contribute further to the development https://freedspeech.xyz May 4, 2026 at 11:41PM

Sunday, May 3, 2026

Show HN: Ableton Live MCP https://ift.tt/XmVd4Bz

Show HN: Ableton Live MCP https://ift.tt/L8QVkq7 May 4, 2026 at 01:05AM

Show HN: GitHub Commits Leaderboard https://ift.tt/NcVLQMH

Show HN: GitHub Commits Leaderboard https://ghcommits.com May 3, 2026 at 11:53PM

Show HN: Software Engineer to Novelist: Writing a Book Like Coding https://ift.tt/DjLeNVz

Show HN: Software Engineer to Novelist: Writing a Book Like Coding I just published my first book, Means and Motive. ( https://ift.tt/GQp75i0 ) As a software engineer, I approached writing like a software project. I used familiar tools (Emacs and HTML) for the primary writing. I built my own tool (EPublish) to transform the HTML manuscript into an .epub file, the source for the ebook version. And I wrote shell scripts to reliably and repeatably transform the .epub version into PDF files for the printed editions. I wrote 'design' and 'architecture' docs, describing the world, key actors, and timelines. I kept a task list of chapters and key scenes that needed to be written, in priority order. Along the way, I kept my files version-controlled so I could see the progress of the novel and edit mercilessly, without worrying about keeping old text around in backup files should I want it back for some reason. If you've thought about writing a book, I highly recommend it. There are many similarities to the software engineering process. You'll also gain a newfound appreciation of the design, layout, and typesetting world, exactly how much work goes into each book you read. https://ift.tt/Euh8wpZ May 3, 2026 at 11:26PM

Saturday, May 2, 2026

Show HN: Use an Android Phone as an HTTP Proxy https://ift.tt/Jy3FRrX

Show HN: Use an Android Phone as an HTTP Proxy I created a simple project to allow you to use a phone as a web proxy. This is not a proxy for the phone, its a way to proxy web traffic from elsewhere via the phone. One practical use case is accessing geo-restricted content. If you have a trusted contact in the country with an Android phone, this can serve as a simple alternative to a commercial VPN. To set it up you need to run a proxy server which can run as a docker container. You then need to install the app on the Android phone which will connect to the server. Finally you configure a browser to use the proxy server as the HTTP/HTTPS proxy. More details here: https://ift.tt/UCQsuBX Let me know how you go and if you run into any issues. https://ift.tt/EFPQ2ZX May 3, 2026 at 07:14AM

Show HN: State of the Art of Coding Models, According to Hacker News Commenters https://ift.tt/PwIiXrM

Show HN: State of the Art of Coding Models, According to Hacker News Commenters Hello HN, I was away from my computer for two weeks, and after coming back and reading the latest discussions on HN about coding assistants (models, harnesses), I felt very out of the loop. My normal process would have been to keep reading and figure out the latest and greatest from people's comments, but I wanted to try and automate this process. Basically the goal is to get a quick overview over which coding models are popular on HN. A next iteration could also scan for harnesses that people use, or info on self-hosting or hardware setups. I wrote a short intro on the page about the pipeline that collects and analyzes the data, but feel free to ask for more details or check the Google Sheet for more info. https://hnup.date/hn-sota https://hnup.date/hn-sota May 3, 2026 at 04:25AM

Show HN: Clipmon is a macOS clipboard manager on steroids https://ift.tt/iayNIrB

Show HN: Clipmon is a macOS clipboard manager on steroids https://ift.tt/nxEj2y1 May 3, 2026 at 03:29AM

Show HN: Rust library for Undo/Redo using deltas, snapshots or commands https://ift.tt/OKvLreD

Show HN: Rust library for Undo/Redo using deltas, snapshots or commands https://ift.tt/JhnYfNH May 3, 2026 at 01:41AM

Friday, May 1, 2026

Show HN: AI CAD Harness https://ift.tt/yJCBOXY

Show HN: AI CAD Harness Hi HN, I'm Zach, one of the co-founders of Adam ( https://adam.new ). We've been on HN twice before with text-to-CAD/3D experiments [1][2]. The honest takeaway from those threads: prompt-to-3D model web apps are fun, but serious mechanical engineers don't want a black box that spits out an STL. They want help inside the CAD tool they already use, with full visibility and control over the feature tree. So we built that. Adam is now a harness that integrates directly with your CAD. It reads your parts, understands the existing feature tree, and edits it for you agentically. We are now live in beta on Onshape and Fusion! [3]: Install link Autodesk Fusion: https://ift.tt/mVAtnav Install link PTC Onshape: https://ift.tt/MRZtHFT... Things people are using it for today: - "Merge redundant features and clean up my tree" - "Rename every feature so the tree is actually readable" - "Round all internal edges with a 2mm fillet" - “Parametrize my model” Along with of course, using Adam to generate CAD end-to-end! A few things we care about that aren't obvious from the listing: 1. From the start we have always believed in CAD as code as the right abstraction. Our harness leverages Onshape's FeatureScript and Python in Fusion heavily. 2. We run an internal CAD benchmark across frontier models. There has been a massive jump in the spatial reasoning capabilities of recently released models. Particularly GPT 5.5 and Opus 4.7 [4] [5] 3. We open-sourced our earlier text-to-CAD work [6] A note on the Anthropic Autodesk connector that shipped a couple days ago [7]: We think it's great for the space and validates the direction. Where Adam is different: - Model-agnostic. We pick whichever frontier model is winning on each task type from our own internal bench, instead of being tied to one lab. - We live natively in your CAD apps and are actively building integrations across all programs What would you want an in-CAD agent to do that nothing does today? [1] https://ift.tt/nVt4pSJ [2] https://ift.tt/UEvbD3W [3] https://ift.tt/REXc5jo [4] https://ift.tt/Wn0SCbu [5] https://ift.tt/UEPojqB [6] https://ift.tt/38jMl1E [7] https://ift.tt/LeQkl1f https://ift.tt/mVAtnav May 2, 2026 at 12:43AM

Show HN: My Private GitHub on Postgres https://ift.tt/RW1Ac34

Show HN: My Private GitHub on Postgres https://ift.tt/TYsyxGL May 2, 2026 at 12:40AM

Show HN: N=1 – iOS app for structured longevity self-protocols https://ift.tt/zhQKCE3

Show HN: N=1 – iOS app for structured longevity self-protocols Hello My name is Henry. I built this app for people who want to know for sure that things that they are trying are actually working. I am looking for enthusiastic people who want to see longevity and bio-hacker community grow. At the moment the app is completely free to use. There is no sign up or anything like that. I need your feedback to build something beautiful. https://ift.tt/BJbNIus May 2, 2026 at 12:30AM

Show HN: Access OPFS from multiple tabs using a fake Shared Worker https://ift.tt/9uvykVH

Show HN: Access OPFS from multiple tabs using a fake Shared Worker https://ift.tt/2fDp1sR May 1, 2026 at 11:15PM