RIP Brenda McDowell

My mother died earlier this month. She’d been diagnosed with cancer back in February 2022 and had been through major surgery and a couple of rounds of chemotherapy, so it wasn’t a complete surprise even if it was faster at the end than expected. That doesn’t make it easy, but I’m glad to be able to say that her immediate family were all with her at home at the end.

I was touched by the number of people who turned up, both to the wake and the subsequent funeral ceremony. Mum had done a lot throughout her life and was settled in Newry, and it was nice to see how many folk wanted to pay their respects. It was also lovely to hear from some old school friends who had fond memories of her.

There are many things I could say about her, but I don’t feel that here is the place to do so. My father and brother did excellent jobs at eulogies at the funeral. However, while I blog less about life things than I did in the past, I did not want it to go unmarked here. She was my Mum, I loved her, and I am sad she is gone.

Repurposing my C.H.I.P.

Way back at DebConf16 Gunnar managed to arrange for a number of Next Thing Co. C.H.I.P. boards to be distributed to those who were interested. I was lucky enough to be amongst those who received one, but I have to confess after some initial experimentation it ended up sitting in its box unused.

The reasons for that were varied; partly about not being quite sure what best to do with it, partly due to a number of limitations it had, partly because NTC sadly went insolvent and there was less momentum around the hardware. I’ve always meant to go back to it, poking it every now and then but never completing a project. I’m finally almost there, and I figure I should write some of it up.

TL;DR: My C.H.I.P. is currently running a mainline Linux 6.3 kernel with only a few DTS patches, an upstream u-boot v2022.1 with a couple of minor patches and an unmodified Debian bullseye armhf userspace.

Storage

The main issue with the C.H.I.P. is that it uses MLC NAND, in particular mine has an 8MB H27QCG8T2E5R. That ended up unsupported in Linux, with the UBIFS folk disallowing operation on MLC devices. There’s been subsequent work to enable an “SLC emulation” mode which makes the device more reliable at the cost of losing capacity by pairing up writes/reads in cells (AFAICT). Some of this hit for the H27UCG8T2ETR in 5.16 kernels, but I definitely did some experimentation with 5.17 without having much success. I should maybe go back and try again, but I ended up going a different route.

It turned out that BytePorter had documented how to add a microSD slot to the NTC C.H.I.P., using just a microSD to full SD card adapter. Every microSD card I buy seems to come with one of these, so I had plenty lying around to test with. I started with ensuring the kernel could see it ok (by modifying the device tree), but once that was all confirmed I went further and built a more modern u-boot that talked to the SD card, and defaulted to booting off it. That meant no more relying on the internal NAND at all!

I do see some flakiness with the SD card, which is possibly down to the dodgy way it’s hooked up (I should probably do a basic PCB layout with JLCPCB instead). That’s mostly been mitigated by forcing it into 1-bit mode instead of 4-bit mode (I tried lowering the frequency too, but that didn’t make a difference).

The problem manifests as:

sunxi-mmc 1c11000.mmc: data error, sending stop command

and then all storage access freezing (existing logins still work, if the program you’re trying to run is in cache). I can’t find a conclusive software solution to this; I’m pretty sure it’s the hardware, but I don’t understand why the recovery doesn’t generally work.

Random power offs

After I had storage working I’d see random hangs or power offs. It wasn’t quite clear what was going on. So I started trying to work out how to find out the CPU temperature, in case it was overheating. It turns out the temperature sensor on the R8 is part of the touchscreen driver, and I’d taken my usual approach of turning off all the drivers I didn’t think I’d need. Enabling it (CONFIG_TOUCHSCREEN_SUN4I) gave temperature readings and seemed to help somewhat with stability, though not completely.

Next I ended up looking at the AXP209 PMIC. There were various scripts still installed (I’d started out with the NTC Debian install and slowly upgraded it to bullseye while stripping away the obvious pieces I didn’t need) and a start-up script called enable-no-limit. This turned out to not be running (some sort of expectation of i2c-dev being loaded and another failing check), but looking at the script and the data sheet revealed the issue.

The AXP209 can cope with 3 power sources; an external DC source, a Li-battery, and finally a USB port. I was powering my board via the USB port, using a charger rated for 2A. It turns out that the AXP209 defaults to limiting USB current to 900mA, and that with wifi active and the CPU busy the C.H.I.P. can rise above that. At which point the AXP shuts everything down. Armed with that info I was able to understand what the power scripts were doing and which bit I needed - i2cset -f -y 0 0x34 0x30 0x03 to set no limit and disable the auto-power off. Additionally I also discovered that the AXP209 had a built in temperature sensor as well, so I added support for that via iio-hwmon.

WiFi

WiFi on the C.H.I.P. is provided by an RTL8723BS SDIO attached device. It’s terrible (and not just here, I had an x86 based device with one where it also sucked). Thankfully there’s a driver in staging in the kernel these days, but I’ve still found it can fall out with my house setup, end up connecting to a further away AP which then results in lots of retries, dropped frames and CPU consumption. Nailing it to the AP on the other side of the wall from where it is helps. I haven’t done any serious testing with the Bluetooth other than checking it’s detected and can scan ok.

Patches

I patched u-boot v2022.01 (which shows you how long ago I was trying this out) with the following to enable boot from external SD:

u-boot C.H.I.P. external SD patch
diff --git a/arch/arm/dts/sun5i-r8-chip.dts b/arch/arm/dts/sun5i-r8-chip.dts
index 879a4b0f3b..1cb3a754d6 100644
--- a/arch/arm/dts/sun5i-r8-chip.dts
+++ b/arch/arm/dts/sun5i-r8-chip.dts
@@ -84,6 +84,13 @@
 		reset-gpios = <&pio 2 19 GPIO_ACTIVE_LOW>; /* PC19 */
 	};
 
+	mmc2_pins_e: mmc2@0 {
+		pins = "PE4", "PE5", "PE6", "PE7", "PE8", "PE9";
+		function = "mmc2";
+		drive-strength = <30>;
+		bias-pull-up;
+	};
+
 	onewire {
 		compatible = "w1-gpio";
 		gpios = <&pio 3 2 GPIO_ACTIVE_HIGH>; /* PD2 */
@@ -175,6 +182,16 @@
 	status = "okay";
 };
 
+&mmc2 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&mmc2_pins_e>;
+	vmmc-supply = <&reg_vcc3v3>;
+	vqmmc-supply = <&reg_vcc3v3>;
+	bus-width = <4>;
+	broken-cd;
+	status = "okay";
+};
+
 &ohci0 {
 	status = "okay";
 };
diff --git a/arch/arm/include/asm/arch-sunxi/gpio.h b/arch/arm/include/asm/arch-sunxi/gpio.h
index f3ab1aea0e..c0dfd85a6c 100644
--- a/arch/arm/include/asm/arch-sunxi/gpio.h
+++ b/arch/arm/include/asm/arch-sunxi/gpio.h
@@ -167,6 +167,7 @@ enum sunxi_gpio_number {
 
 #define SUN8I_GPE_TWI2		3
 #define SUN50I_GPE_TWI2		3
+#define SUNXI_GPE_SDC2		4
 
 #define SUNXI_GPF_SDC0		2
 #define SUNXI_GPF_UART0		4
diff --git a/board/sunxi/board.c b/board/sunxi/board.c
index fdbcd40269..f538cb7e20 100644
--- a/board/sunxi/board.c
+++ b/board/sunxi/board.c
@@ -433,9 +433,9 @@ static void mmc_pinmux_setup(int sdc)
 			sunxi_gpio_set_drv(pin, 2);
 		}
 #elif defined(CONFIG_MACH_SUN5I)
-		/* SDC2: PC6-PC15 */
-		for (pin = SUNXI_GPC(6); pin <= SUNXI_GPC(15); pin++) {
-			sunxi_gpio_set_cfgpin(pin, SUNXI_GPC_SDC2);
+		/* SDC2: PE4-PE9 */
+		for (pin = SUNXI_GPE(4); pin <= SUNXI_GPE(9); pin++) {
+			sunxi_gpio_set_cfgpin(pin, SUNXI_GPE_SDC2);
 			sunxi_gpio_set_pull(pin, SUNXI_GPIO_PULL_UP);
 			sunxi_gpio_set_drv(pin, 2);
 		}


I’ve sent some patches for the kernel device tree upstream - there’s an outstanding issue with the Bluetooth wake GPIO causing the serial port not to probe(!) that I need to resolve before sending a v2, but what’s there works for me.

The only remaining piece is patch to enable the external SD for Linux; I don’t think it’s appropriate to send upstream but it’s fairly basic. This limits the bus to 1 bit rather than the 4 bits it’s capable of, as mentioned above.

Linux C.H.I.P. external SD DTS patch ```diff diff --git a/arch/arm/boot/dts/sun5i-r8-chip.dts b/arch/arm/boot/dts/sun5i-r8-chip.dts index fd37bd1f3920..2b5aa4952620 100644 --- a/arch/arm/boot/dts/sun5i-r8-chip.dts +++ b/arch/arm/boot/dts/sun5i-r8-chip.dts @@ -163,6 +163,17 @@ &mmc0 { status = "okay"; }; +&mmc2 { + pinctrl-names = "default"; + pinctrl-0 = <&mmc2_4bit_pe_pins>; + vmmc-supply = <&reg_vcc3v3>; + vqmmc-supply = <&reg_vcc3v3>; + bus-width = <1>; + non-removable; + disable-wp; + status = "okay"; +}; + &ohci0 { status = "okay"; }; ```


As for what I’m doing with it, I think that’ll have to be a separate post.

Buttering up my storage

(TL;DR: I’ve been trying out btrfs in some places instead of ext4, I’ve hit absolutely zero issues and there are a few features that make me plan to use it more.)

Despite (or perhaps because of) working on storage products for a reasonable chunk of my career I have tended towards a conservative approach to my filesystems. By the time I came to Linux ext2 was well established, the move to ext3 was a logical one (the joys of added journalling for faster recovery after unclean shutdowns) and for a long time my default stack has been MD raid with LVM2 on top and then ext4 as the filesystem.

I’ve dabbled with other filesystems; I ran XFS for a while on my VDR machine, and also when I had a large tradspool with INN, but never really had a hard requirement for it. I’ve ended up adminning a machine that had JFS in the past, largely for historical reasons, but don’t really remember any issues (vague recollections of NFS problems but that might just have been NFS being NFS).

However. ZFS has gathered itself a significant fan base and that makes me wonder about what it can offer and whether I want that. Firstly, let’s be clear that I’m never going to run a primary filesystem that isn’t part of the mainline kernel. So ZFS itself is out, because I run Linux. So what do I want that I can’t get with ext4? Firstly, I’d like data checksumming. As storage gets larger there’s a bigger chance of silent data corruption and while I have backups of the important stuff that doesn’t help if you don’t know you need to use them. Secondly, these days I have machines running containers, VMs, or with lots of source checkouts with a reasonable amount of overlap in their data. Disk space has got cheaper, but I’d still like to be able to do some sort of deduplication of common blocks.

So, I’ve been trying out btrfs. When I installed my desktop I went with btrfs for / and /home (I kept /boot as ext4). The thought process was that this was a local machine (so easy access if it all went wrong) and I take regular backups (so if it all went wrong I could recover). That was a year and a half ago and it’s been pretty dull; I mostly forget I’m running btrfs instead of ext4. This is on a machine that tracks Debian testing, so currently on kernel 6.1 but originally installed with 5.10. So it seems modern btrfs is reasonably stable for a machine that isn’t driven especially hard. Good start.

The fact I forget what filesystem I’m running points to the fact that I’m not actually doing anything special here. I get the advantage of data checksumming, but not much else. 2 things spring to mind. Firstly, I don’t do snapshots. Given I run testing it might be wiser if I did take a snapshot before every apt-get upgrade, and I have a friend who does just that, but even when I’ve run unstable I’ve never had a machine get itself into a state that I couldn’t recover so I haven’t spent time investigating. I note Ubuntu has apt-btrfs-snapshot but it doesn’t seem to have any updates for years.

The other thing I didn’t do when I installed my desktop is take advantage of subvolumes. I’m still trying to get my head around exactly what I want them for, but they provide a partial replacement for LVM when it comes to carving up disk space. Instead of the separate / and /home LVs I created I could have created a single LV that would have a single btrfs filesystem on it. / and /home would then be separate subvolumes, allowing me to snapshot each individually. Quotas can also be applied separately so there’s still the potential to prevent one subvolume taking all available space.

Encouraged by the lack of hassle with my desktop I decided to try moving my sbuild machine over to use btrfs for its build chroots. For Reasons this is a VM kindly hosted by a friend, rather than something local. To be honest these days I would probably go for local hosting, but it works and there’s no strong reason to move. The point is it’s remote, and so if migrating went wrong and I had to ask for assistance I’d be bothering someone who’s doing me a favour as it is.

The build VM is, of course, running LVM, and there was luckily some free space available. I’m reasonably sure the underlying storage involves spinning rust, so I did a laborious set of pvmove commands to make sure all the available space was at the start of the PV, and created a new btrfs volume there. I was advised that while btrfs-convert would do the job it was better to create a fresh filesystem where possible. This time I did create an initial root subvolume.

Configuring up sbuild was then much simpler than I’d expected. My setup originally started out as a set of tarballs for the chroots that would get untarred + used for the builds, which is pretty slow. Once overlayfs was mature enough I switched to that. I’d had a conversation with Enrico about his nspawn/btrfs setup, but it turned out Russ Allbery had written an excellent set of instructions on sbuild with btrfs. I tweaked my existing setup based on his details, and I was in business. Each chroot is a separate subvolume - I don’t actually end up having to mount them individually, but it means that only the chroot in use gets snapshotted. For example during a build the following can be observed:

# btrfs subvolume list /
ID 257 gen 111534 top level 5 path root
ID 271 gen 111525 top level 257 path srv/chroot/unstable-amd64-sbuild
ID 275 gen 27873 top level 257 path srv/chroot/bullseye-amd64-sbuild
ID 276 gen 27873 top level 257 path srv/chroot/buster-amd64-sbuild
ID 343 gen 111533 top level 257 path srv/chroot/snapshots/unstable-amd64-sbuild-328059a0-e74b-4d9f-be70-24b59ccba121

I was a little confused about whether I’d got something wrong because the snapshot top level is listed as 257 rather than 271, but digging further with btrfs subvolume show on the 2 mounted directories correctly showed the snapshot had a parent equal to the chroot, not /.

As a final step I ran jdupes via jdupes -1Br / to deduplicate things across the filesystem. It didn’t end up providing a significant saving unfortunately - I guess there’s a reasonable amount of change between Debian releases - but I think tried it on my desktop, which tends to have a large number of similar source trees checked out. There I managed to save about 5% on /home, which didn’t seem too shabby.

The sbuild setup has been in place for a couple of months now, and I’ve run quite a few builds on it while preparing for the freeze. So I’m fairly confident in the stability of the setup and my next move is to transition my local house server over to btrfs for its containers (which all run under systemd-nspawn). Those are generally running a Debian stable base so there should be a decent amount of commonality for deduping.

I’m not saying I’m yet at the point where I’ll default to btrfs on new installs, but I’m definitely looking at it for situations where I think I can get benefits from deduplication, or being able to divide up disk space without hard partitioning space.

(And, just to answer the worry I had when I started, I’ve got nowhere near ENOSPC problems, but I believe they’re handled much more gracefully these days. And my experience of ZFS when it got above 90% utilization was far from ideal too.)

Fixing mobile viewing

It was brought to my attention recently that the mobile viewing experience of this blog was not exactly what I’d hope for. In my poor defence I proof read on my desktop and the only time I see my posts on mobile is via FreshRSS. Also my UX ability sucks.

Anyway. I’ve updated the “theme” to a more recent version of minima and tried to make sure I haven’t broken it all in the process (I did break tagging, but then I fixed it again). I double checked the generated feed to confirm it was the same (other than some re-tagging I did), so hopefully I haven’t flooded anyone’s feed.

Hopefully I can go back to ignoring the underlying blog engine for another 5+ years. If not I’ll have to take a closer look at Enrico’s staticsite.

First impressions of the VisionFive 2

VisionFive 2 packaging

Back in September last year I chose to back the StarFive VisionFive 2 on Kickstarter. I don’t have a particular use in mind for it, but I felt it was one of the first RISC-V systems that were relatively capable (mentally I have it as somewhere between a Raspberry Pi 3 + a Pi 4). In particular it’s a quad 1.5GHz 64-bit RISC-V core with 8G RAM, USB3, GigE ethernet and a single M.2 PCIe slot. More than ample as a personal machine for playing around with RISC-V and doing local builds. I ended up paying £67 for the Early Bird variant (dual GigE ethernet rather than 1 x 100Mb and 1 x GigE). A couple of weeks ago I got an email with a tracking number and last week it finally turned up.

Being impatient the first thing I did was plug it into a monitor, connect up a keyboard, and power it on. Nothing except some flashing lights. Looking at the boot selector DIP switches suggested it was configured to boot from UART, so I flipped them to (what I thought was) the flash setting. It wasn’t - turns out the “ON” marking on the switches represents logic 0 and it was correctly setup when I got it. I went to read the documentation which talked about writing an image to a MicroSD card, but also had details of the UART connection. Wanting to make sure the device was at least doing something before I actually tried an OS on it I hooked up a USB/serial dongle and powered the board up again. Success! U-Boot appeared and I could interact with it.

I went to the VisionFive2 Debian page and proceeded to torrent the Image-69 image, writing it to a MicroSD card and inserting it in the slot on the bottom of the board. It booted fine. I can’t even tell you what graphical environment it booted up because I don’t remember; it worked fine though (at 1080p, I’ve seen reports that 4K screens will make it croak).

Poking around the image revealed that it’s built off a snapshot.debian.org snapshot from 20220616T194833Z, which is a little dated at this point but I understand the rationale behind picking something that works and sticking with it. The kernel is of course a vendor special, based on 5.15.0. Further investigation revealed that the entire X/graphics stack is living in /usr/local, which isn’t overly surprising; it’s Imagination based. I was pleasantly surprised to discover there is work to upstream the Imagination support, but I’m not planning to run the board with a monitor attached so it’s not a high priority for me.

Having discovered all that I decided to see how well a “clean” Debian unstable install from Debian Ports would go. I had a spare Intel Optane lying around (it’s a stupid 22110 M.2 which is too long for any machine I own), so I put it in the slot on the bottom of the board. To my surprise it Just Worked and was detected ok:

# lspci
0000:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0000:01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01)
0001:00:00.0 PCI bridge: PLDA XpressRich-AXI Ref Design (rev 02)
0001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane]

I created a single partition with an ext4 filesystem (initially tried btrfs, but the StarFive kernel doesn’t support it), and kicked off a debootstrap with:

# mkfs -t ext4 /dev/nvme0n1p1
# mount /dev/nvme0n1p1 /mnt
# debootstrap --keyring=/etc/apt/trusted.gpg.d/debian-ports-archive-2023.gpg \
	unstable /mnt https://deb.debian.org/debian-ports

The u-boot setup has a convoluted set of vendor scripts that eventually ends up reading a /boot/extlinux/extlinux.conf config from /dev/mmcblk1p2, so I added an additional entry there using the StarFive kernel but pointing to the NVMe device for /. Made sure to set a root password (not that I’ve been bitten by that before, too many times), and rebooted. Success! Well. Sort of. I hit a bunch of problems with having a getty running on ttyS0 as well as one running on hvc0. The second turns out to be a console device from the RISC-V SBI. I did a systemctl mask serial-getty@hvc0.service which made things a bit happier, but I was still seeing odd behaviour and output. Turned out I needed to reboot the initramfs as well; the StarFive one was using Plymouth and doing some other stuff that seemed to be confusing matters. An update-initramfs -k 5.15.0-starfive -c built me a new one and everything was happy.

Next problem; the StarFive kernel doesn’t have IPv6 support. StarFive are good citizens and make their 5.15 kernel tree available, so I grabbed it, fed it the existing config, and tweaked some options (including adding IPV6 and SECCOMP, which chrony wanted). Slight hiccup when it turned out trying to do things like make sound modular caused it to fail to compile, and having to backport the fix that allowed the use of GCC 12 (as present in sid), but it got there. So I got cocky and tried to update it to the latest 5.15.94. A few manual merge fixups (which I may or may not have got right, but it compiles and boots for me), and success. Timings:

$ time make -j 4 bindeb-pkg
… [linux-image-5.15.94-00787-g1fbe8ac32aa8]
real	37m0.134s
user	117m27.392s
sys	6m49.804s

On the subject of kernels I am pleased to note that there are efforts to upstream the VisionFive 2 support, with what appears to be multiple members of StarFive engaging in multiple patch submission rounds. It’s really great to see this and I look forward to being able to run an unmodified mainline kernel on my board.

Niggles? I have a few. The provided u-boot doesn’t have NVMe support enabled, so at present I need to keep a MicroSD card to boot off, even though root is on an SSD. I’m also seeing some errors in dmesg from the SSD:

[155933.434038] nvme nvme0: I/O 436 QID 4 timeout, completion polled
[156173.351166] nvme nvme0: I/O 48 QID 3 timeout, completion polled
[156346.228993] nvme nvme0: I/O 108 QID 3 timeout, completion polled

It doesn’t seem to cause any actual issues, and it could be the SSD, the 5.15 kernel or an actual hardware thing - I’ll keep an eye on it (I will probably end up with a different SSD that actually fits, so that’ll provide another data point).

More annoying is the temperature the CPU seems to run at. There’s no heatsink or fan, just the metal heatspreader on top of the CPU, and in normal idle operation it sits at around 60°C. Compiling a kernel it hit 90°C before I stopped the job and sorted out some additional cooling in the form of a desk fan, which kept it as just over 30°C.

Bare VisionFive 2 SBC board with a small desk fan pointed at it

I haven’t seen any actual stability problems, but I wouldn’t want to run for any length of time like that. I’ve ordered a heatsink and also realised that the board supports a Raspberry Pi style PoE “Hat”, so I’ve got one of those that includes a fan ordered (I am a complete convert to PoE especially for small systems like this).

With the desk fan setup I’ve been able to run the board for extended periods under load (I did a full recompile of the Debian 6.1.12-1 kernel package and it took about 10 hours). The M.2 slot is unfortunately only a single PCIe v2 lane, and my testing topped out at about 180MB/s. IIRC that is about half what the slot should be capable of, and less than a 10th of what the SSD can do. Ethernet testing with iPerf3 sustained about 941Mb/s, so basically maxing out the port. The board as a whole isn’t going to set any speed records, but it’s perfectly usable, and pretty impressive for the price point.

On the Debian side I’ve not hit any surprises. There’s work going on to move RISC-V to a proper release architecture, and I’m hoping to be able to help out with that, but the version of unstable I installed from the ports infrastructure has looked just like any other Debian install. Which is what you want. And that pretty much sums up my overall experience of the VisionFive 2; it’s not noticeably different than any other single board computer. That’s a good thing, FWIW, and once the kernel support lands properly upstream (it’ll be post 6.3 at least it seems) it’ll be a boring mainline supported platform that just happens to be RISC-V.

subscribe via RSS