onak 0.6.4 released

A bit delayed in terms of an announcement, but last month I tagged a new version of onak, my OpenPGP compatible keyserver. It’s been 2 years since the last release, and this is largely a bunch of minor fixes to make compilation under Debian trixie with more recent CMake + GCC versions happy.

OpenPGP v6 support, RFC9580, hasn’t made it. I’ve got a branch which adds it, but a lack of keys to do any proper testing with, and no X448 support implemented, mean I’m not ready to include it in a release yet. The plan is that’ll land for 0.7.0 (along with some backend work), but no idea when that might be.

Available locally or via GitHub.

0.6.4 - 7th September 2025

  • Fix building with CMake 4.0
  • Fixes for building with GCC 15
  • Rename keyd(ctl) to onak-keyd(ctl)

Local Voice Assistant Step 5: Remote Satellite

The last (software) piece of sorting out a local voice assistant is tying the openWakeWord piece to a local microphone + speaker, and thus back to Home Assistant. For that we use wyoming-satellite.

I’ve packaged that up - https://salsa.debian.org/noodles/wyoming-satellite - and then to run I do something like:

$ wyoming-satellite --name 'Living Room Satellite' \
    --uri 'tcp://[::]:10700' \
    --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw -D plughw:CARD=CameraB409241,DEV=0' \
    --snd-command 'aplay -D plughw:CARD=UACDemoV10,DEV=0 -r 22050 -c 1 -f S16_LE -t raw' \
    --wake-uri tcp://[::1]:10400/ \
    --debug

That starts us listening for connections from Home Assistant on port 10700, uses the openWakeWord instance on localhost port 10400, uses aplay/arecord to talk to the local microphone and speaker, and gives us some debug output so we can see what’s going on.

And it turns out we need the debug. This setup is a bit too flaky for it to have ended up in regular use in our household. I’ve had some problems with reliable audio setup; you’ll note the Python is calling out to other tooling to grab audio, which feels a bit clunky to me and I don’t think is the actual problem, but the main audio for this host is hooked up to the TV (it’s a media box), so the setup for the voice assistant needs to be entirely separate. That means not plugging into Pipewire or similar, and instead giving direct access to wyoming-satellite. And sometimes having to deal with how to make the mixer happy + non-muted manually.

I’ve also had some issues with the USB microphone + speaker; I suspect a powered USB hub would help, and that’s on my list to try out.

When it does work I have sometimes found it necessary to speak more slowly, or enunciate my words more clearly. That’s probably something I could improve by switching from the base.en to small.en whisper.cpp model, but I’m waiting until I sort out the audio hardware issue before poking more.

Finally, the wake word detection is a little bit sensitive sometimes, as I mentioned in the previous post. To be honest I think it’s possible to deal with that, if I got the rest of the pieces working smoothly.

This has ended up sounding like a more negative post than I meant it to. Part of the issue in a resolution is finding enough free time to poke things (especially as it involves taking over the living room and saying “Hey Jarvis” a lot), part of it is no doubt my desire to actually hook up the pieces myself and understand what’s going on. Stay tuned and see if I ever manage to resolve it all!

Local Voice Assistant Step 4: openWakeWord

People keep asking me when I’ll write the next instalment of my local voice assistant journey. I didn’t mean for it to be so long since the last one, things have been busier than I’d like. Anyway. Last time we’d built Tensorflow, so now it’s time to sort out openWakeWord. As a reminder we’re trying to put a local voice satellite on my living room Debian media machine.

The point of openWakeWord is to run on the machine the microphone is connected to, listening for the wake phrase (“Hey Jarvis” in my case), and only then calling back to the central server to do a speech to text operation. It’s wrapped up for Wyoming as wyoming-openwakeword.

Of course I’ve packaged it up - available at https://salsa.debian.org/noodles/wyoming-openwakeword. Trixie only released yesterday, so I’m still running all of this on bookworm. That means you need python3-wyoming from Trixie - 1.6.0-1 will install fine without needing rebuilt - and the python3-tflite-runtime we built last time.

Like the other pieces I’m not sure about how this could land in Debian; it’s unclear to me that the pre-trained models provided would be accepted in main.

As usual I start it with with a systemd unit file dropped in /etc/systemd/service/wyoming-openwakeword.service:

[Unit]
Description=Wyoming OpenWakeWord server
After=network.target

[Service]
Type=simple
DynamicUser=yes
ExecStart=/usr/bin/wyoming-openwakeword --uri tcp://[::1]:10400/ --preload-model 'hey_jarvis' --threshold 0.8

MemoryDenyWriteExecute=false
ProtectControlGroups=true
PrivateDevices=false
ProtectKernelTunables=true
ProtectSystem=true
RestrictRealtime=true
RestrictNamespaces=true

[Install]
WantedBy=multi-user.target

I’m still playing with the threshold level. It defaults to 0.5, but the device lives under the TV and seems to get a bit confused by it sometimes. There’s some talk about using speex for noise suppression, but I haven’t explored that yet (it’s yet another Python module to bind to the C libraries I’d have to look at).

This is a short one; next post is actually building the local satellite on top to tie everything together.

Local Voice Assistant Step 3: A Detour into Tensorflow

To build our local voice satellite on a Debian system rather than using the ATOM Echo device we need something that can handle the wake word component; the piece that means we only send audio to the Home Assistant server for processing by whisper.cpp when we’ve detected someone is trying to talk to us.

openWakeWord seems to be one of the better ways to do this, and is well supported. However. It relies on TensorFlow Lite (now LiteRT) which is a complicated mess of machine learning code. tflite-runtime is available from PyPI, but that’s prebuilt and we’re trying to avoid that.

Despite, on initial impressions, it looking quite complicated to deal with building TensorFlow - Bazel is an immediate warning - it turns out to be incredibly simple to build your own .deb:

$ wget -O tensorflow-v2.15.1.tar.gz https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.15.1.tar.gz
…
$ tar -axf tensorflow-v2.15.1.tar.gz
$ cd tensorflow-2.15.1/
$ BUILD_NUM_JOBS=$(nproc) BUILD_DEB=y tensorflow/lite/tools/pip_package/build_pip_package_with_cmake.sh
…
$ find . -name *.deb
./tensorflow/lite/tools/pip_package/gen/tflite_pip/python3-tflite-runtime-dbgsym_2.15.1-1_amd64.deb
./tensorflow/lite/tools/pip_package/gen/tflite_pip/python3-tflite-runtime_2.15.1-1_amd64.deb

This is hiding an awful lot of complexity, however. In particular the number of 3rd party projects that are being downloaded in the background (and compiled, to be fair, rather than using binary artefacts).

We can build the main C++ wrapper .so directly with cmake, allowing us to investigate a bit more:

mkdir tf-build
cd tf-build/
cmake \
    -DCMAKE_C_FLAGS="-I/usr/include/python3.11" \
    -DCMAKE_CXX_FLAGS="-I/usr/include/python3.11" \
    ../tensorflow-2.15.1/tensorflow/lite/
cmake --build . -t _pywrap_tensorflow_interpreter_wrapper
…
[100%] Built target _pywrap_tensorflow_interpreter_wrapper
$ ldd _pywrap_tensorflow_interpreter_wrapper.so
    linux-vdso.so.1 (0x00007ffec9588000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f22d00d0000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f22cf600000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f22d00b0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f22cf81f000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f22d01d1000)

Looking at the output we can see that pthreadpool, FXdiv, FP16 + PSimd are all downloaded, and seem to have ways to point to a local copy. That seems positive.

However, there are even more hidden dependencies, which we can see if we look in the _deps/ subdirectory of the build tree. These don’t appear to be as easy to override, and not all of them have packages already in Debian.

First, the ones that seem to be available: abseil-cpp, cpuinfo, eigen, farmhash, flatbuffers, gemmlowp, ruy + xnnpack

(lots of credit to the Debian Deep Learning Team for these, and in particular Mo Zhou)

Dependencies I couldn’t see existing packages for are: OouraFFT, ml_dtypes & neon2sse.

At this point I just used the package I built with the initial steps above. I live in hope someone will eventually package this properly for Debian, or that I’ll find the time to try and help out, but that’s not going to be today.

I wish upstream developers made it easier to use system copies of their library dependencies. I wish library developers made it easier to build and install system copies of their work. pkgconf is not new tech these days (pkg-config appears to date back to 2000), and has decent support in CMake. I get that there can be issues with incompatibilities even in minor releases, or awkwardness in doing builds of multiple connected projects, but at least give me the option to do so.

Local Voice Assistant Step 2: Speech to Text and back

Having setup an ATOM Echo Voice Satellite and hooked it up to Home Assistant we now need to actually do something with the captured audio. Home Assistant largely deals with voice assistants using the Wyoming Protocol, which describes itself as essentially JSONL + PCM audio. It works nicely in terms of meaning everything can exist as separate modules that then just communicate over network sockets, and there are a whole bunch of Python implementations of the pieces necessary.

The first bit I looked at was speech to text; how do I get what I say to the voice satellite into something that Home Assistant can try and parse? There is a nice self contained speech recognition tool called whisper.cpp, which is a low dependency implementation of inference using OpenAI’s Whisper model. This is wrapped up for Wyoming as part of wyoming-whisper-cpp. Here we get into something that unfortunately seems common in this space; the repo contains a forked copy of whisper.cpp with enough differences that I couldn’t trivially make it work with regular whisper.cpp. That means missing out on new development, and potential improvements (the fork appears to be at v1.5.4, upstream is up to v1.7.5 at the time of writing). However it was possible to get up and running easily enough.

[I note there is a Wyoming Whisper API client that can use the whisper.cpp server, and that might be a cleaner way to go in the future, especially if whisper.cpp ends up in Debian.]

I stated previously I wanted all of this to be as clean an installed on Debian stable as possible. Given most of this isn’t packaged, that’s meant I’ve packaged things up as I go. I’m not at the stage anything is suitable for upload to Debian proper, but equally I’ve tried to make them a reasonable starting point. No pre-built binaries available, just Salsa git repos. https://salsa.debian.org/noodles/wyoming-whisper-cpp in this case. You need python3-wyoming from trixie if you’re building for bookworm, but it doesn’t need rebuilt.

You need a Whisper model that’s been converts to ggml format; they can be found on Hugging Face. I’ve ended up using the base.en model. I found small.en gave more accurate results, but took a little longer, when doing random testing, but it doesn’t seem to make much of a difference for voice control rather than plain transcribing.

[One of the open questions about uploading this to Debian is around the use of a prebuilt AI model. I don’t know what the right answer is here, and whether the voice infrastructure could ever be part of Debian proper, but the current discussion on the interpretation of the DFSG on AI models is very relevant.]

I run this in the same container as my Home Assistant install, using a systemd unit file dropped in /etc/systemd/system/wyoming-whisper-cpp.service:

[Unit]
Description=Wyoming whisper.cpp server
After=network.target

[Service]
Type=simple
DynamicUser=yes
ExecStart=wyoming-whisper-cpp --uri tcp://localhost:10030 --model base.en

MemoryDenyWriteExecute=false
ProtectControlGroups=true
PrivateDevices=false
ProtectKernelTunables=true
ProtectSystem=true
RestrictRealtime=true
RestrictNamespaces=true

[Install]
WantedBy=multi-user.target

It needs the Wyoming Protocol integration enabled in Home Assistant; you can “Add Entry” and enter localhost + 10030 for host + port and it’ll get added. Then in the Voice Assistant configuration there’ll be a whisper.cpp option available.

Text to speech turns out to be weirdly harder. The right answer is something like Wyoming Piper, but that turns out to be hard on bookworm. I’ll come back to that in a future post. For now I took the easy option and used the built in “Google Translate” option in Home Assistant. That needed an extra stanza in configuration.yaml that wasn’t entirely obvious:

media_source:

With this, and the ATOM voice satellite, I could now do basic voice control of my Home Assistant setup, with everything except the text-to-speech piece happening locally! Things such as “Hey Jarvis, turn on the study light” work out of the box. I haven’t yet got into defining my own phrases, partly because I know some of the things I want (“What time is it?”) are already added in later Home Assistant versions than the one I’m running.

Overall I found this initially complicated to setup given my self-imposed constraints about actually understanding the building blocks and compiling them myself, but I’ve been pretty impressed with the work that’s gone into it all. Next step, running a voice satellite on a Debian box.

subscribe via RSS