Using Postgresql repmgr to manage replication/failover
(and with an existing DB!)
Recently I started to play with repmgr. This was mostly so I could feel better at how Postgresql handles replication, but more so I wanted a way to have an easy failover/switch to primary system. Repmgr can do all this (and more!) very easily and quite painlessly, even with servers with existing data. The docs aren’t perfect, so here are my notes for when I tested this myself.
Jank I had to deal with not clear in docs
- Had to put entries in pg_hba.conf higher than most rules for it to work.
- Because I had an existing setup and did not want to clone, I had to:
- Force a register with
repmgr -f /etc/repmgr.conf standby register --force.
repmgr -f /etc/repmgr.conf standby followto start following the primary.
- Force a register with
Changes to make if running from systemd
After reading this github issue, I learned that I had to add these entries to repmgr.conf to use systemd startup/stop instead of the default
service_start_command='sudo systemctl start firstname.lastname@example.org' service_stop_command='sudo systemctl stop email@example.com' service_restart_command='sudo systemctl restart firstname.lastname@example.org' service_reload_command='sudo systemctl reload email@example.com'
AND because I want the
postgres user to restart itself on repmgr changes remotely, I had to give it specific commands in the conf, and allow
postgres user to run these commands with
postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl start firstname.lastname@example.org postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop email@example.com postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart firstname.lastname@example.org postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl reload email@example.com
We use replication slots to handle WAL sending/receiving, and repmgr can support this by just adding a config edit to repmgr.conf. I did this after the fact and when I did a switchover, it automatically made the
repmgr_slot_2 slot. Pretty great that they considered that, and not restricted to the initial follow.
That’s more or less it!
Assuming the two servers have private keys exchanged for ssh, upgrading a standby to primary was as easy as
repmgr -f /etc/repmgr.conf standby switchover. It’s fun!
Aside from the
pg_hba.conf weirdness I had to deal with, the docs for getting this set up was really smooth. I was able to figure out how to add to my existing replication setup with minimal jank, and I’m really excited to start using a witness server to automate a failover with
repmgrd (will update when I do). Also, the commands are VERY English/verbose and within an hour or so I felt pretty confident with what verbs/terminology meant.
If you have a Postgresql replication setup and want to feel better about a bit more control/automation, you should give it a shot! Let me know about your successes and failures
- Automatic failover with a witness server https://repmgr.org/docs/current/repmgrd-automatic-failover.html