Learning to love Ansible

This post attempts to chart my journey towards getting usefully started with Ansible to manage my system configurations. It’s a high level discussion of how I went about doing so and what I got out of it, rather than including any actual config snippets - there are plenty of great resources out there that handle the actual practicalities of getting started much better than I could.

I’ve been convinced about the merits of configuration management for machines for a while now; I remember conversations about producing an appropriate set of recipes to reproduce our haphazard development environment reliably over 4 years ago. That never really got dealt with before I left, and as managing systems hasn’t been part of my day job since then I never got around to doing more than working my way through the Puppet Learning VM. I do, however, continue to run a number of different Linux machines - a few VMs, a hosted dedicated server and a few physical machines at home and my parents’. In particular I have a VM which handles my parents’ email, and I thought that was a good candidate for trying to properly manage. It’s backed up, but it would be nice to be able to redeploy that setup easily if I wanted to move provider, or do hosting for other domains in their own VMs.

I picked Ansible, largely because I wanted something lightweight and the agentless design appealed to me. All I really need to do is ensure Python is on the host I want to manage and everything else I can bootstrap using Ansible itself. Plus it meant I could use the version from Debian testing on my laptop and not require backports on the stable machines I wanted to manage.

My first attempt was to write a single Ansible YAML file which did all the appropriate things for the email VM; installed Exim/Apache/Roundcube, created users, made sure the appropriate SSH keys were in place, installed configuration files, etc, etc. This did the job, but I found myself thinking it was no better than writing a shell script to do the same things.

Things got a lot better when instead of concentrating on a single host I looked at what commonality was shared between hosts. I started with simple things; Debian is my default distro so I created an Ansible role debian-system which configured up APT and ensured package updates were installed. Then I added a task to setup my own account and install my SSH keys. I was then able to deploy those 2 basic steps across a dozen different machine instances. At one point I got an ARM64 VM from Scaleway to play with, and it was great to be able to just add it to my Ansible hosts file and run the playbook against it to get my basic system setup.

Adding email configuration got trickier. In addition to my parents’ email VM I have my own email hosted elsewhere (along with a whole bunch of other users) and the needs of both systems are different. Sitting down and trying to manage both configurations sensibly forced me to do some rationalisation of the systems, pulling out the commonality and then templating the differences. Additionally I ended up using the lineinfile module to edit the Debian supplied configurations, rather than rolling out my own config files. This helped ensure more common components between systems. There were also a bunch of differences that had grown out of the fact each system was maintained by hand - I had about 4 copies of each Let’s Encrypt certificate rather than just putting one copy in /etc/ssl and pointing everything at that. They weren’t even in the same places on different systems. I unified these sorts of things as I came across them.

Throughout the process of this rationalisation I was able to easily test using containers. I wrote an Ansible role to create systemd-nspawn based containers, doing all of the LVM + debootstrap work required to produce a system which could then be managed by Ansible. I then pointed the same configuration as I was using for the email VM at this container, and could verify at each step along the way that the results were what I expected. It was still a little nerve-racking when I switched over the live email config to be managed by Ansible, but it went without a hitch as hoped.

I still have a lot more configuration to switch to being managed by Ansible, especially on the machines which handle a greater number of services, but it’s already proved extremely useful. To prepare for a jessie to stretch upgrade I fired up a stretch container and pointed the Ansible config at it. Most things just worked and the minor issues I was able to fix up in that instance leaving me confident that the live system could be upgraded smoothly. Or when I want to roll out a new SSH key I can just add it to the Ansible setup, and then kick off an update. No need to worry about whether I’ve updated it everywhere, or correctly removed the old one.

So I’m a convert; things were a bit more difficult by starting with existing machines that I didn’t want too much disruption on, but going forward I’ll be using Ansible to roll out any new machines or services I need, and expect that I’ll find that new deployment to be much easier now I have a firm grasp on the tools available.