ads

Saturday, May 23, 2026

Show HN: I built a RAG and knowledge graph agent that runs locally https://ift.tt/nVdFvft

Show HN: I built a RAG and knowledge graph agent that runs locally Claw-Coder is an AI agent that runs locally on your laptop and has access to powerful tools instead of configuring claude or codex to use a local model just use claw-coder. Why was claw-coder created? Answer: To solve the problem of privacy and security. When you use an agent that is configured with a cloud model like codex, cursor, Claude etc. You are not just getting the agent but you are giving up your codebase to train an llm which is a bit concerning and this reduces trust in the technology called AI but now another problem comes in performance when you switch to a local model that is not made for that workflow you lose performance, speed, and it becomes really a tradeoff so that's where claw-coder comes in it not only runs on your machine but all the code, rag, knowledge graph etc info is kept local making the privacy problem solved but now what about performance. Performance: Local llms are not built to do the cool things cloud models do because the model sizes are not even capable of building real apps like the 8b models, 13b, even 1b so the solution I came up with was to give these small models access to tools and features that make it actually work well in coding performance. So what does claw-coder have access to: A knowledge graph: A knowledge graph is an interconnected network of real-world entities—such as people, places, concepts, or events—and the relationships between them. It organizes information into a readable web of meaning rather than static lists, allowing both humans and AI to understand context. So how does this help an AI, it gives the AI the ability to tell relationships between code in your codebase, a cloned unknown repo and so forth this increases performance of local llms by far in coding tasks and reasoning abilities. Rag: We have ever had of RAG at some point but there is a catch the context window of local llms can't bear large codebases and repos so RAG isn't optional by storing vectors in a vector store you enable the AI to actually know what the code means and what each piece does to the other letting you load millions of lines into the vector store without blowing up the context window. Tools: So we have discussed the tiny but powerful ways to improve local llm performance but an agent to be an agent it needs to take action this is where exposing tools to the local llm helps so what tools have been implemented into claw-coder. 1. search_tool This enables the ai agent to actually search up to date info so that it doesnt hallucinate on info it doesn't know which is common in local llms. 2. Docker execution This agent has a special folder called workspace where it does its work without destroying your desktop but this is not enough to protect your desktop from being destroyed by cheap code so this is where docker comes in I have implemented docker containers of various languages where the agent can validate its own code this is powerful because all llms not only local ones generate code they can't even confirm works because they are just powerful predictors so enabling it to run its code can surprisingly increase the usefulness of the generated code because it now knows it works or not even for html and css the ai agent has been given a helpful vision llm to actually explain what rendered in the browser. This is the surprising power of giving an llm a docker execution tool. We have looked at a lot of how claw-coder is different enabling local llms to actually do real work. But how do you actually try it out yourself: Claw-coder is closed source because it is going through heavy testing but that doesnt kill transparency and since we are testing it doesn't stop people from trying it on real codebase and giving feedback to get started use: brew tap gabriel-c70/claw then brew install claw-coder May 23, 2026 at 11:06PM

Friday, May 22, 2026

Show HN: Mechs.lol – a free, web-based autoshooter game https://ift.tt/5wCVx6a

Show HN: Mechs.lol – a free, web-based autoshooter game One unexpected benefit of LLMs is I can work on projects I otherwise wouldn't have taken on. I made a web-based autoshooter (with multiplayer support) heavily using AI / LLMs. This is something I'd consider "alpha" quality so don't expect a super polished experience but it's hopefully fun https://mechs.lol May 23, 2026 at 12:04AM

Show HN: Lilo – An open source personal AI assistant that lives in Telegram https://ift.tt/ctJGTPK

Show HN: Lilo – An open source personal AI assistant that lives in Telegram Hi everyone, I wanted to share an open source Telegram-based personal AI assistant I built. It’s a model-agnostic agent with memory, skills, tools (like web search, browser user, etc.) operating in a persistent workspace. It also has support for scheduled tasks, and can build powerful HTML-based apps that live in the workspace. Here are some of my favorite use cases: * Send Lilo photos of food, and it tracks your calories. * Leave a voice note on your run to pause your supplements, and Lilo adds a TODO. * Have Lilo remind you when the Knicks game starts and even send you score updates every 5 minutes. * Have Lilo read an article out loud. Or give you a summary of the top stories on Hacker News. * Forward a Uber receipt, and pull it up later to file for a reimbursement at work. * Schedule a meeting with Jess next week, ask for suggestions on location, and next week, remind you to leave for the meeting on time. While Telegram is my most frequently used channel, Lilo can also be accessed by email, WhatsApp, a website and a mobile app. Email is particularly useful: I often forward receipts, invites, etc for Lilo to handle. How is this different from OpenClaw and Hermes Agent? Here are some reasons: - Runs on a remote machine/in the cloud rather than your local machine - your local data is safe, and the assistant is available 24/7. - More visual/ more GUI - Lilo comes with a default set of apps like a TODO list that you can interact with not just by text, but also with a GUI on the mobile and web app. - The Telegram integration is very comprehensive (handles replies, voice notes, reactions, etc.). I use Lilo a ton to manage my life. Would love to hear your feedback! Github: https://ift.tt/PjCynGp May 22, 2026 at 11:03PM

Thursday, May 21, 2026

Show HN: I Made a Claude Skill for Spec-Driven Development (SDD) https://ift.tt/jihdulz

Show HN: I Made a Claude Skill for Spec-Driven Development (SDD) At my work they provided a single Claude subscription for everyone on the team. To be honest I like kiro better as it provides a way better SDD management. But the company can't provide it and I can't afford it yet. Turns out I had the skill creator skill in my claude instance so I made use of it to create this Skill. I made it fully by using Claude but I wanted to make it open source, so I asked it to help me make tests and preparations for it, even a CI to run python tests. Well, we got this results with it: - Phase 2A: 67 static assertions (Python script, runs in CI) - Phase 2B: 15 behavioral tests (live Claude Code session) - Phase 2C: 53 generation quality checks across 3 end-to-end flows All of these passed and the CI also passed (after a few tries). I made it to suit my way of prompting and coding and based it off kiro's SDD management, but I want it to be publicly available and used by many people. According to claude some of the testers need to fit the following criteria: 1. Developer starting a real new project from scratch 2. Solo dev with an active side project (greenfield or partial codebase) 3. Team lead whose team uses multiple AI tools 4. Developer with an existing codebase and no written specs 5. Developer who actively uses 3+ AI coding tools It's actually a blind test, no guiding, just try it if you can, I'd really appreciate your help. The repo is here: https://ift.tt/WBGdFby https://ift.tt/WBGdFby May 21, 2026 at 07:49PM

Show HN: Freenet, a peer-to-peer platform for decentralized apps https://ift.tt/8ZH7A6p

Show HN: Freenet, a peer-to-peer platform for decentralized apps For the past 5 years or so I've been working on a ground-up redesign of Freenet, my peer-to-peer project from the early 2000s (now renamed Hyphanet). The new Freenet has been up and running since December along with some early applications like River[1], our decentralized group chat and Delta - a decentralized CMS. Users have already started to build their own apps on Freenet including games, and we have some interesting apps in development like Atlas, a search/recommendation engine. Architecturally, this new Freenet is a global, decentralized key-value store where keys are webassembly contracts which define what values (aka "state") are valid for that key, how or when the values can be mutated, and how the state can be efficiently synchronized between peers. We've developed a unique (AFAIK) solution to the consistency problem, every contract must define a "merge" operation for the contract's associated state. This operation must be commutative, meaning that you can merge multiple states in any order and you'll get the same end result. This approach allows state updates to spread through the network like a virus[2], which typically achieves consistent global state in a few seconds or less. Like the world wide web, Freenet applications can be downloaded from the network itself and run in a web browser - similar to single-page apps on the normal web. However, rather than connecting back to an API running in a datacenter, the webapp connects locally to the Freenet peer and interacts with Freenet contracts and delegates over a local websocket connection. If you'd like to try Freenet we have convenient installers for the major desktop OSs but not yet mobile, and you can be chatting with other users on River within seconds[3]. Happy to answer any questions, you're also welcome to read our FAQ[4], or watch a talk I gave back in March[5]. [1] https://ift.tt/RYrSpj3 [2] https://ift.tt/9fU108J [3] https://ift.tt/RLUr42e [4] https://ift.tt/uUHBInq [5] https://youtu.be/3SxNBz1VTE0 https://freenet.org/ May 21, 2026 at 09:34PM

Show HN: Agent.email – sign up via curl, claim with a human OTP https://ift.tt/iKZqzuG

Show HN: Agent.email – sign up via curl, claim with a human OTP Hi HN! We're Haakam, Michael, and Adi from AgentMail- a ycs25 company. We give AI agents their own email inboxes. Recently, we ran an experiment called Agent.Email. It's a signup flow designed specifically for AI agents instead of humans. The inspiration came from a few comments we received when we did our seed launch a few months back. They all came from the very apt observation that agents not being able to sign up to a product made for agents without human credentials was ironic and unideal. This is basically the thesis we built AgentMail on: The internet was made for humans exclusively, designed to keep machines out by default. Every signup flow assumes a browser, a person reading a page, and clicking a confirmation link. Unless agents can't do that, they can't be first class users of the internet. Agents can now get an email inbox by themselves. (This also means a lot of email nobody wants to read gets processed by AI instead of your inbox being cluttered with spam and slop) Here's how agent.email works. Agent needs an inbox and hits AgentMail via curl. Agent receives instructions via MD unless the request comes from a browser, in which case we use HTML. Agent decides agent.email is useful and then hits the sign-up endpoint with its human email as a parameter. Agent receives a restricted inbox with credentials. Agent emails the human asking for an OTP. Human replies with the code, and the agent is claimed and restrictions are lifted. Until claimed, the agent can only email its own human and nobody else. Ten emails a day, and the signup endpoint is rate-limited hard by IP. Right now it's a 1:1 mapping between agent and human. The next step is many-to-one, because one person running several agents in parallel is already very common. Building agent.email also pushed us to revisit places in AgentMail where the default assumptions were built around the primary user being human. For example, the CLI outputs in a single column with consistent formatting because mixed delimiters are easy for a person to scan, but harder for an agent reasoning about structure. We also shortened messageIDs after agents started hallucinating completions on longer ones. A few things we'd like the community's take on: is restricted-until-claimed the right trust model? Does agent self-signup feel useful in production, or is it mostly a novelty, and if it's a novelty now, what would make it actually useful? Should agent onboarding require human approval by default, or should some agents be able to fully self-provision? What do you think are some additional measures we can take for secure sign-ups? May 21, 2026 at 11:42PM

Wednesday, May 20, 2026

Show HN: IgniteMS – batch text embeddings at 253K msg/s on 8x A100 https://ift.tt/nVdFPLY

Show HN: IgniteMS – batch text embeddings at 253K msg/s on 8x A100 https://ift.tt/hnElig3 May 21, 2026 at 12:07AM

Show HN: I made a tool for learning scales, chords, and how to combine them https://ift.tt/jAuY4n3

Show HN: I made a tool for learning scales, chords, and how to combine them This started out when I vibe-coded a guitar scale fingering generator. It came out pretty good, and I started adding stuff to it: chords, then how chords and scales interact. Then I added charts for other instruments I mess around with: piano, cello, alto recorder. There's a complexity toggle to go from basic harmony to extended/experimental stuff. It's honestly still mostly a toy, but I thought other people might be interested in playing with it. Source is on github, so it's easy enough to run locally and fork. https://ift.tt/xh8T3Ru https://ift.tt/l54DbYI May 21, 2026 at 12:44AM

Tuesday, May 19, 2026

Show HN: How Expensive Is Your (Steam) Wishlist? https://ift.tt/9rSJMyh

Show HN: How Expensive Is Your (Steam) Wishlist? A tool/toy that lets you connect to your Steam wishlist to calculate the total list/current price of all the games on it. There's a shallow, jokey purpose to it ("I could buy a BMW with this amount!"), but the real purpose is to demonstrate how we can do a better job of portraying a game catalog. I often wishlist stuff, then it pops up in a "Hey, it's on sale!" email months later. In that email, there's a banner capsule, but that doesn't help my brain remember why I added it. To that end, after you get the bill, you get a nice, flat feed of stuff about all the titles you've wishlisted over the years. It's all stuff that developers painstakingly put together, but which Steam tucks away under the fold of a game's Store page. Anyway, my wishlist came to about $250. My QA guy is up to $19k. Give it a go; hope you enjoy it! https://ift.tt/FcbPJht May 20, 2026 at 12:15AM

Show HN: Haystack – Review the PRs that need human attention https://ift.tt/p9ubg0v

Show HN: Haystack – Review the PRs that need human attention Hey HN! We're building Haystack ( https://ift.tt/wKJ875U ) to help teams deal with the explosion in the number of pull requests that need to be reviewed due to the rise of coding agents. Haystack replaces the GitHub PR review system with a queue that triages each PR before a human has to read any diffs. It looks at the diffs, the codebase, and the coding-agent conversation that produced the PR. Haystack then routes it into one of three buckets: 1. Safe to merge. This means the PR has enough evidence behind it that the team can merge it without another human's review. Some examples: -- A small UI copy change that includes a screenshot showing the final state -- A backend change where the author clearly tested the important paths and ran the changes in a real environment 2. Needs fixes. This means that the PR has bugs or violates a rule in your codebase and therefore the PR needs to be fixed by the author. Some examples: -- The agent was asked to make loading a large table faster by adding pagination, but the PR still loads every result at once and "implements" pagination in the UI -- The PR silently catches an error instead of logging, surfacing, or handling it. This violates the team's "no silent error swallowing" rule 3. Needs human review. This means that the PR could not be sufficiently verified by the author or is touching a sensitive part of the codebase (determined by user-input guidelines) and thus requires human review. Some examples: -- The PR changes a significant amount of logic in billing -- The PR changes an important user flow like onboarding, but the author only ran unit tests and never opened the app to check the flow end-to-end. That violates the team's rule that high-impact user-facing changes need manual verification. Instead of starting with line-by-line diffs, Haystack immediately tells the reviewer the goal behind the PR, what design decisions the author made (informed by their coding-agent conversation), and how much the author did to verify that the pull request works (e.g. run scripts, checked the frontend, etc.). In this way, review shifts from "what changed?" to "is this the right behavior and is there evidence that it works?". Here's a quick demo: https://ift.tt/0hK4Pra... We previously launched Haystack as a tool for understanding large PRs ( https://ift.tt/2XcURvi ). As many of you can probably relate to, the release of Opus 4.5 completely shattered our conception of how fast an engineer could craft a PR. And as coding agents got even better from 4.5, we realized that pull requests did not scale along with our coding velocity. With each member of our team being able to pump out more than 20 pull requests a day, code review quickly became cognitively exhausting and less helpful. After talking with other folks, we learned many feel similarly, and currently face the binary option of either not doing review at all or trying to keep up with a fire hose of pull requests. Haystack is our attempt at a third path. We still believe in code review, but as coding agents produce more code, human reviewer attention becomes more valuable and more expensive. Haystack helps teams spend that attention on the PRs where a human can meaningfully change the outcome of that PR. And for such PRs, Haystack shows the reviewer what the PR intended to do, whether the author showed that it works, and what design decisions need a second pair of eyes. We're still quite early and are figuring out whether Haystack truly makes code review better. We would love any and all feedback! https://ift.tt/wKJ875U May 19, 2026 at 12:44AM

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs https://ift.tt/fyqnNzX

Show HN: Superlog (YC P26) – Observability that installs itself and fixes bugs Hey HN, we’re Nico and Arseniy, co-founders of Superlog ( https://superlog.sh ). We're building a self-installing, self healing observability tool meant not to be opened. It has a wizard that daily sets up proper logging and an agent that investigates errors and opens PRs. Super short demo: https://www.youtube.com/watch?v=xFhU9Mk247M . In our earlier startups, we tried Sentry, Datadog, Grafana, Dash0, and nothing was good enough. Proper telemetry and alerting still requires a ton of manual setup. We struggled with adding good logs, so debugging was tough, especially as codebases grow at a faster pace. Meanwhile, the Datadog/Dash0 bill kept climbing, and we still spent engineering hours to learn, configure, and maintain our observability tooling. With Sentry, we found ourselves flooded by a stream of alerts into our Slack channel, most were duplicates or lacked context, so alert fatigue/constant interrupts were a real pain. The #ops notification is consistently the worst feeling on a Saturday morning We’ve seen too many times servers run out of memory and disk, and three AWS metrics giving us three different values. Half of the graphs on dashboards are normally empty or outdated, and manually clicking through UIs, especially when the team is small, seems like a huge waste of time. At some point we realized that solving this problem would be more valuable than the things we had been working on, and we had the expertise to do it, since Arseniy had spent years at Datadog, getting paged during the night to debug production incidents. So we decided to build a platform that would just work: agent-first, MCP-native, zero-setup. Here’s how Superlog works: we have a wizard that scans your repo, and automatically instruments it with well-structured logs, traces and metrics via OpenTelemetry. We make sure to highlight main failure modes, endpoint performance, usage per tenant, and LLM/upstream cost (by callsite, tenant and model). Errors get fingerprinted and grouped into incidents, so you see one issue, not a thousand duplicates. When you get a notification from Superlog, you see a clear failure summary, its inferred severity and impact upfront. Then the agent investigates and tries to solve the issue. If it has enough context, it produces a concise and tested PR. If it doesn't, it posts its findings for the investigating team, and automatically pulls in the engineers that could contribute more context based on documentation, previous investigations and Slack threads. Either way the output is one clean PR per incident, posted in Slack, that you can merge, ignore, or open as a Claude Code session and modify. Three things we think are different from other observability vendors: (1) We solve the setup pain. The wizard will instrument everything with native OTel SDKs, respecting the semantic conventions, with proper service and environment tagging. We’re also working on native automatic dashboards and alerts, so that you can see what’s going on in a glance and don’t miss subtle failure modes. (2) Our telemetry doesn’t decay. The wizard runs daily, and keeps adding logs, alerts and dashboards where it’s needed. You don't have to remember to instrument new features. The next time something breaks, the data you need to debug it is already there. (3) Our goal is to solve alert fatigue. We use agents to merge similar errors and refine the summaries, giving you relevant information upfront. We have a custom evaluation setup that makes sure that our summaries are dense and correct, and severity and impact is on point. We also give you confidence scores for every LLM-enhanced metric so that wrong guesses don’t get boosted. Important: superlog telemetry is vendor-neutral, so you keep all the logs/metrics/traces we install. Pricing is on the site. We're early, so expect rough edges and please tell us when you find them. You can try it at https://superlog.sh . We'd love to hear what you're using today, what's broken about it, and whether the "one mergeable PR per incident" model sounds useful or terrifying. Especially keen to hear from folks running integration-heavy products, anyone who's rolled their own observability, and anyone who has tried Sentry / Datadog MCPs and given up. Comments and feedback welcome! https://superlog.sh/ May 19, 2026 at 10:54PM

Monday, May 18, 2026

Show HN: We missed Winamp, so we built an audio player for macOS https://ift.tt/G9cAKH8

Show HN: We missed Winamp, so we built an audio player for macOS https://ift.tt/JHEaPmG May 19, 2026 at 02:20AM

Show HN: Marlin-2B: a tiny VLM to extract structured information from videos https://ift.tt/WbFSG2E

Show HN: Marlin-2B: a tiny VLM to extract structured information from videos https://ift.tt/QPzDZOh May 19, 2026 at 01:06AM