covid20212022
ads
ads
Wednesday, May 13, 2026
Show HN: Neural window manager, neural network moving windows from mouse actions https://ift.tt/cgGRh2o
Show HN: Neural window manager, neural network moving windows from mouse actions I'd been mulling over this crazy idea for a while. Can programs be generated? Inspired by recent advances in world models, I wondered if we could do away with source code and generate pixels directly and interactively. As an experiment to answer this, I set out to create a neural window manager, training a neural network to predict what the screen would look like next. Basically, the idea was to generate the next frame based on the last two frames and the mouse position. That's it: moving windows without programming an event system, just a simple convolutional neural network guessing pixels. To implement the experiment, I used Pygame to simulate a turquoise desktop background, a gray window with a navy blue title bar, a white cursor, and four colors in total. Then, a bot randomly dragged the window, and I recorded everything, processing the frames as color index matrices (not RGB, to avoid complications) and the mouse delta (dx, dy, click) that caused each transition. 8000 frames, a few minutes in Colab. The model is a unitary neural network (UNET). The encoder compresses the stacked frames, the decoder reconstructs the next one, and the mouse vector coordinates are projected with a linear layer to fit the spatial size of the bottleneck. There, they are concatenated before decoding, so that motion information feeds each jump connection. And it works! Which still surprises me a little. You can drag, and the window follows you; when you release, it stops. There's no internal state, no (x, y) coordinates anywhere. The model infers the position from what it sees, which works until it doesn't. But after a couple of seconds of strange movement, the window starts to distort. This will probably improve with more computing power for training and more examples, but to narrow the scope of the experiment and test it within a web browser, I decided to abandon the rendering aspect and have the model predict primitives instead of pixels, simply converting the motion engine into a neural network. Basically, I trained a small MLP to receive (distance to the title bar, distance to the resize point, click) and generate (dx, dy, dw, dh), with two separate heads: one for moving and one for resizing. The trick is that they share nothing except the click signal, so the model can't confuse dragging with resizing. I then exported it to ONNX as well, and now everything runs in the browser, without a server, just a canvas element and two small neural networks communicating with each other. With this new approach, the renderer remains deterministic, with rectangles drawn in JavaScript, but the window's behavior (where it moves, how it resizes) is learned from examples. It feels like a peculiar middle ground between traditional and neural, so you can feel the space the network has learned by interacting with it: dragging near the title bar moves it, but approaching the corner resizes the window. There are no conditionals or hitbox code; the network simply learned where those areas are from examples. Sometimes it gets confused near the edges, which, frankly, is more interesting than if it worked perfectly; you can perceive how the probability changes. This makes sense when you think about it, because no (x, y) coordinates are stored in these models; the position is implied in the activations. It works well for short sequences, but fails when asked to maintain state over time. Update: A few weeks later, Meta published the Neural Computers article (2604.06425, it's worth reading). The premise is the same, but they go much further: cli and uis, real programs. Their failure modes are practically identical to those I found with the pure pixel version: "challenges persist with routine reuse, controlled updates, and symbolic stability." which is a fancy way of saying that the window blurs after a few seconds (that was the reason for choosing deterministic rendering). https://lusob.github.io/neural-os/ May 14, 2026 at 12:46AM
Show HN: Splice – A programming language with custom VM for embedded systems https://ift.tt/1J7UGZu
Show HN: Splice – A programming language with custom VM for embedded systems https://ift.tt/aVm50Kh May 13, 2026 at 10:01PM
Show HN: Mistle – Open-source infrastructure for running sandboxed coding agents https://ift.tt/2VB1Q9m
Show HN: Mistle – Open-source infrastructure for running sandboxed coding agents Hi HN, I'm Jonathan. My co-founder, Thomas, and I started building Mistle in Feb. We saw larger tech companies like Ramp (Inspect) and Stripe (Minions) build this internally and thought an open source version should exist. We made a few very intentional decisions when working on this: 1. Credentials are kept out of the sandbox. Authorized access goes through a proxy, so agents do not directly receive credentials. 2. The harness is not our problem. We're not going to tackle things like memory, self-learning. 3. No magic. Configurations are explicit. You can bring your own keys for models, sandboxes, and other providers. You can write your own instructions and agent. Mistle can be run locally with a single command: https://ift.tt/dHQSJoC Questions, feedback and ideas are welcome! https://ift.tt/ciOf7VK May 13, 2026 at 09:37PM
Tuesday, May 12, 2026
Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder https://ift.tt/TSp7ALG
Show HN: Gigacatalyst – Extend your SaaS with an embedded AI builder Hi HN, I’m Namanyay from Gigacatalyst (link: https://ift.tt/zUEf2xX ). Gigacatalyst allows sales, CS, and users to build one-off features, so your SaaS can support long-tail customer workflows and engineers aren’t pulled away from the roadmap. When you sell software to large businesses, you realize that each customer needs their own workflow and features. Traditionally, this either means long engineering roadmaps or the customers end up using workarounds. But what if everyone could build their critical missing features just by talking to an AI? That’s what we do at Gigacatalyst. We provide an AI customization layer for your customers, CS team, and sales team to build these missing critical workflows without needing any engineers at all. Think Lovable, but built on top of YOUR platform. We connect to your product's APIs, learn your data model and design system, and let non-technical users build governed apps via natural language - inside your product, under your brand. Here’s what it looks like in action: https://www.youtube.com/watch?v=_taSpSphH6E One of our customers, a Series B company, saw their users ( not engineers - managers, ops people, facility directors) build critical workflows like: - Parts stockout prevention: A maintenance manager typed "show me which parts will run out in the next 2 weeks based on usage over the last 90 days, accounting for vendor lead times." The app tracks consumption velocity, forecasts stockouts, and alerts before it's too late. He says it's prevented ~$500K in emergency downtime. - Invoice OCR from phone photos: Technicians kept losing paper invoices. The prompt: "upload a photo of the invoice, extract vendor name, date, amount, and line items, then match it to the purchase order and flag discrepancies." Now techs snap a photo on-site to automatically add to the system of record. - Restaurant emergency triage: A pizza chain's facilities manager was drowning in maintenance requests. He built a priority matrix: "walk-in freezer not cooling" auto-routes as CRITICAL, "dining room light flickering" goes to LOW. He's now able to manage backlogs with the correct priority. How Gigacatalyst works under the hood: 1. Agentic API discovery: Our agents go through your app and parse your endpoints, query params, request/response shapes, and sample data to build the base layer. 2. Generation and Validation: When a user describes what they want our AI generates an app. We set up multiple validation steps, including static checks, runtime error analysis, and LLM-as-a-judge. 3. Sandboxing and Compilation: We wrote our own compilation and sandboxing framework to get the fastest speeds and lowest costs. This means that users can interact with the built app in seconds. 4. Proxy layer: We create a proxy layer for all APIs to handle auth, tenant isolation, and rate limiting. Everything the agent has access to is controlled, logged, observed, and version controlled. After 2000+ daily users, 900+ apps built, and 70% 30-day retention, today we're opening a public demo. Try it: https://ift.tt/OnuSl4D - enter your SaaS product's API URL (or just the homepage) and start prompting. If you're serving a variety of use cases, you probably deal with a lot of custom requests and Gigacatalyst will save you time and increase your bottom line. Book a meeting at https://ift.tt/l3ZDX7R and I'll help your team and customers build new functionality on top of your platform. I've been reading Hacker News since I was 12 years old. I'm proud to launch for all of you and I want to hear your feedback on my product and comments! May 12, 2026 at 11:32PM
Show HN: Statewright – Visual state machines that make AI agents reliable https://ift.tt/Noc75Cy
Show HN: Statewright – Visual state machines that make AI agents reliable Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/srVMbWG Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws. https://ift.tt/exPVQjK May 12, 2026 at 09:24PM
Monday, May 11, 2026
Show HN: SyncBank – Self-hosted bank sync for EU banks https://ift.tt/lOUwfBR
Show HN: SyncBank – Self-hosted bank sync for EU banks https://syncbank.app/ May 12, 2026 at 01:02AM
Show HN: Learn2Burp – Surgery-free solution for R-CPD https://ift.tt/S4u1Kps
Show HN: Learn2Burp – Surgery-free solution for R-CPD R-CPD (Retrograde Cricopharyngeus Dysfunction) is a condition where a muscle in the throat never learned to relax properly, making it impossible to burp. It affects more people than you'd think and causes significant discomfort, extreme bloating, and social anxiety. The most common medical treatment is a botox injection, but it's expensive and not accessible to everyone. I'm a Software Engineer from Germany and suffered from R-CPD my entire life before curing myself last year.
I wanted to make the self-teaching process easier for everyone who comes after me, so I built Learn2Burp. It walks you through exercises with video guidance, builds a workout plan around your specific situation, and includes a burp tracker. There's also a wiki covering the questions I wish I'd had answers to when I started. If you or someone you know has R-CPD, there's also a dedicated r/noburp community worth checking out. https://learn2burp.com May 11, 2026 at 07:42PM
Sunday, May 10, 2026
Show HN: adamsreview – better multi-agent PR reviews for Claude Code https://ift.tt/arhDfok
Show HN: adamsreview – better multi-agent PR reviews for Claude Code I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments. On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives. adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully. You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated. The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors. It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool. I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews. Repo: https://ift.tt/ASLN36V Install:
/plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview https://ift.tt/ASLN36V May 11, 2026 at 09:06AM
Show HN: I trained a chess engine to play like humans https://ift.tt/hnxCAP2
Show HN: I trained a chess engine to play like humans I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There's a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways. Live demo: https://1e4.ai
Code: https://ift.tt/OZyWap6 A few things that might be interesting: - Trained on almost a full year of Lichess blitz games, around 1B total games - Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might. - Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU - Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations. - Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges - Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://ift.tt/5LDTng4... - The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline. May 11, 2026 at 05:31AM
Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés https://ift.tt/SqrYViE
Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés I built this after my brother started complaining that I got too much into brainrot culture. It's just for fun nothing serious, but was able to test vercel, tanstack start and convex without high stakes. Have fun! This is the game where lower score is goood for your mental health https://ift.tt/adPviwh May 11, 2026 at 03:36AM
Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm https://ift.tt/BKShLY6
Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm It started out as a way for me to freshen up my C++ skills during COVID. But life got in the way and it was put on ice. Luckily, coding LLMs came to the rescue and allowed me to bring it to a point where I feel comfortable sharing it. https://ift.tt/yD1hPWf May 11, 2026 at 01:29AM
Saturday, May 9, 2026
Show HN: Create flashcards with Space CLI https://ift.tt/O5D8pY1
Show HN: Create flashcards with Space CLI Hey, I created seven years ago a flashcard app with a main focus on UX. In the last months I added offline-first mode and a CLI that allows Claude Code or Codex to create high quality flashcards for you. I use that to learn about pharma rules, technology, dancing, taxes and smart home. Never really did marketing, this not my specialty. Would love to know what you think https://ift.tt/f38SBoR May 9, 2026 at 09:38PM
Show HN: A search engine for deleted YouTube videos (1.5B+ indexed since 2005) https://ift.tt/eMDGYLy
Show HN: A search engine for deleted YouTube videos (1.5B+ indexed since 2005) https://ift.tt/97yhYsL May 9, 2026 at 10:09PM
Subscribe to:
Posts (Atom)