The last (software) piece of sorting out a local voice assistant is tying the openWakeWord piece to a local microphone + speaker, and thus back to Home Assistant. For that we use wyoming-satellite.

I’ve packaged that up - https://salsa.debian.org/noodles/wyoming-satellite - and then to run I do something like:

$ wyoming-satellite --name 'Living Room Satellite' \
    --uri 'tcp://[::]:10700' \
    --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw -D plughw:CARD=CameraB409241,DEV=0' \
    --snd-command 'aplay -D plughw:CARD=UACDemoV10,DEV=0 -r 22050 -c 1 -f S16_LE -t raw' \
    --wake-uri tcp://[::1]:10400/ \
    --debug

That starts us listening for connections from Home Assistant on port 10700, uses the openWakeWord instance on localhost port 10400, uses aplay/arecord to talk to the local microphone and speaker, and gives us some debug output so we can see what’s going on.

And it turns out we need the debug. This setup is a bit too flaky for it to have ended up in regular use in our household. I’ve had some problems with reliable audio setup; you’ll note the Python is calling out to other tooling to grab audio, which feels a bit clunky to me and I don’t think is the actual problem, but the main audio for this host is hooked up to the TV (it’s a media box), so the setup for the voice assistant needs to be entirely separate. That means not plugging into Pipewire or similar, and instead giving direct access to wyoming-satellite. And sometimes having to deal with how to make the mixer happy + non-muted manually.

I’ve also had some issues with the USB microphone + speaker; I suspect a powered USB hub would help, and that’s on my list to try out.

When it does work I have sometimes found it necessary to speak more slowly, or enunciate my words more clearly. That’s probably something I could improve by switching from the base.en to small.en whisper.cpp model, but I’m waiting until I sort out the audio hardware issue before poking more.

Finally, the wake word detection is a little bit sensitive sometimes, as I mentioned in the previous post. To be honest I think it’s possible to deal with that, if I got the rest of the pieces working smoothly.

This has ended up sounding like a more negative post than I meant it to. Part of the issue in a resolution is finding enough free time to poke things (especially as it involves taking over the living room and saying “Hey Jarvis” a lot), part of it is no doubt my desire to actually hook up the pieces myself and understand what’s going on. Stay tuned and see if I ever manage to resolve it all!