ads

Thursday, December 19, 2024

Show HN: Solution to RSI and poor Mac dictation, accurately transcribes "memcpy" https://ift.tt/GOiuVpY

Show HN: Solution to RSI and poor Mac dictation, accurately transcribes "memcpy" About a year ago, my RSI had gotten to the point where I needed to take days off. I went down the rabbit hole of trackball mice, split keyboards, and dictation. Talon voice coding seemed like the dream solution I was looking for, but I just couldn’t get it right (here’s a link to the video I was inspired by: https://www.youtube.com/watch?v=0ZZb12Qp6-0 )). Around the same time, I increasingly adapted to new ways of life- tabbing, code suggests, completions, chat-based coding (I know, I know), and brainstorming with tools like Copilot, Codeium, Claude, and then Cursor. Mac's built-in dictation was my first attempt at a solution. It handled basic emails/messages fine but completely fell apart with technical terms. I couldn’t build a workflow around it. “malloc,” "memcpy," “Axum,” “Tauri”… these always had to be edited, and my wrists would still protest. I had been playing around with Whisper ever since it first came out (my work with MEL spectrograms and ECGs led me deep into OpenAI's Whisper). Once the whisper.cpp project got metal support, it worked beautifully on Apple Silicon Macs. What I ended up building - Local dictation tool - Works system-wide on Mac (any text input) - Custom dictionaries for technical terms/jargon - No cloud dependencies, no tracking, no signup, no subscriptions (one-time purchase) Technical details - Running Whisper Large Turbo locally (or quantized Whisper small/medium) - Sits in the background, requests accessibility permissions, and can paste text anywhere - Voice activity detection is experimental; haven't nailed the parameters yet. This will help with long recordings/live transcriptions - Sandboxed app, available through the Mac App Store My usage stats (just personal benchmarks) - Transcribed around 30k words over 2 weeks - Averaging 150 WPM vs my usual 55-60 WPM (significantly lower when I need to think—it’s not like one of those typing tests) - I’ve spoken for about 4 hours to it. So roughly 6-8 hours saved - Some occasional mistakes; there's a find and replace tool. I add random non-dictionary words for large prompts https://ift.tt/uZ25NLc https://ift.tt/uZ25NLc December 19, 2024 at 07:09PM

No comments:

Post a Comment