Using Postgresql repmgr to manage replication/failover

(and with an existing DB!)

(Last Updated: 2022-09-12 15:27:00 -0400 EDT)

Recently I started to play with repmgr. This was mostly so I could feel better at how Postgresql handles replication, but more so I wanted a way to have an easy failover/switch to primary system. Repmgr can do all this (and more!) very easily and quite painlessly, even with servers with existing data. The docs aren’t perfect, so here are my notes for when I tested this myself.

https://repmgr.org/docs/current/getting-started.html

Jank I had to deal with not clear in docs

Changes to make if running from systemd

After reading this github issue, I learned that I had to add these entries to repmgr.conf to use systemd startup/stop instead of the default pg_ctl:

service_start_command='sudo systemctl start postgresql@14-main.service'
service_stop_command='sudo systemctl stop postgresql@14-main.service'
service_restart_command='sudo systemctl restart postgresql@14-main.service'
service_reload_command='sudo systemctl reload postgresql@14-main.service'

AND because I want the postgres user to restart itself on repmgr changes remotely, I had to give it specific commands in the conf, and allow postgres user to run these commands with sudo:

postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl start postgresql@14-main.service
postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop postgresql@14-main.service
postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart postgresql@14-main.service
postgres ALL=(ALL) NOPASSWD: /usr/bin/systemctl reload postgresql@14-main.service

Replication Slots

We use replication slots to handle WAL sending/receiving, and repmgr can support this by just adding a config edit to repmgr.conf. I did this after the fact and when I did a switchover, it automatically made the repmgr_slot_2 slot. Pretty great that they considered that, and not restricted to the initial follow.

That’s more or less it!

Assuming the two servers have private keys exchanged for ssh, upgrading a standby to primary was as easy as repmgr -f /etc/repmgr.conf standby switchover. It’s fun!

Final Thoughts

Aside from the pg_hba.conf weirdness I had to deal with, the docs for getting this set up was really smooth. I was able to figure out how to add to my existing replication setup with minimal jank, and I’m really excited to start using a witness server to automate a failover with repmgrd (will update when I do). Also, the commands are VERY English/verbose and within an hour or so I felt pretty confident with what verbs/terminology meant.

If you have a Postgresql replication setup and want to feel better about a bit more control/automation, you should give it a shot! Let me know about your successes and failures

Next Steps