PostgreSQL high availability made charmingly easy

Tags: PostgreSQL , RDBMS

In a previous blog, we talked about patterns to run a database in a highly available manner. In this blog, we present our open source recipe for PostgreSQL high availability.

Benefits of a highly available PostgreSQL 

A good recipe should always start by exposing the benefits of the concerned meal. Therefore, we will start this section with a brief introduction to the benefits of using PostgreSQL and why high availability is, often, a mandate.

Benefits of consuming the elephant’s services

PostgreSQL has been successfully used for more than three decades in all IT sectors. Its maturity and its vibrant community consistently make it a first choice database among developers, as the latest survey from StackOverflow shows. PostgreSQL is also known for its excellent support of the SQL standard and for its versatility with more than 1000 extensions available.

Now that we have an overview of PostgreSQL, we will discuss why running only one instance of PostgreSQL is generally a bad idea.

Why one is not enough  

Reports like ManageForce’s or Phenom’s estimated the cost of an outage to be around $500,000 per hour, on average. A database outage often cascades to its consuming applications and causes a degradation on the dependent services. Therefore, a database outage is one of the worst scenarios for any company.

Unfortunately, there is more  to outages than impact on revenue. According to  IDC’s 2021 Worldwide State of Data Protection and DR Survey, an outage might also lead to loss of productivity, data loss and reputation damage. 

This is why running a highly available PostgreSQL deployment is beneficial for your company. In the next section, we will discuss the components we choose – at Canonical – to automate the deployment of H.A. PostgreSQL clusters.

The open-source ingredients

The following diagram provides an overview on the components we use:

The following table lists the versions we will use:

ComponentPurpose/FunctionsVersion (major)
PostgreSQLDatabase server.14
PgBouncerConnection pooling and failover.1
PatroniAvailability and replication manager.3
pgBackRestBackup and restore.2
Charmed operatorSoftware operator for PostgreSQL to help us automate tasks like deployment and scaling.14/stable
Juju Operator framework allowing our PostgreSQL operator to run on top of many clouds including K8s, OpenStack, AWS, Azure and GCP.3.4
LXDManager of virtual machines and containers.5.21

Let’s cover how to install these components.

Our recipe for PostgreSQL high availability

Creating a playground

We will start by creating a virtual machine to serve as a sandbox. In this blog, we will use LXD to create our VM. So let’s start by installing LXD:

sudo snap install lxd --channel=5.21/stable
sudo lxd init --auto

Let’s now create the VM and connect to it:

sudo lxc launch ubuntu:24.04 pg --vm -c limits.cpu=4 -c limits.memory=12GiB
sudo lxc exec pg bash
su -l ubuntu

Preparing the environment

Now that we have a playground, the first step is to install Juju, our operator framework:

sudo snap install juju --channel=3.4/stable
mkdir -p /home/ubuntu/.local/share

Next, we will again install LXD but within the created VM. This time it will help us emulate a local cloud by spawning containers as needed by Juju.

sudo snap install lxd --channel=5.21/stable
sudo lxd init --auto

We will also disable IPV6 as follows:

sudo lxc network set lxdbr0 ipv6.address none
sudo iptables -P FORWARD ACCEPT

The next step is to bootstrap Juju by performing the following commands:

juju bootstrap
Clouds
aws
aws-china
aws-gov
azure
azure-china
equinix
google
localhost
oracle

Select a cloud [localhost]: localhost

Enter a name for the Controller [localhost-localhost]: demo-postgres

We will start by creating a model (you can think namespace) to host our PostgreSQL instances.

juju add-model mymodel

Deploying our first PostgreSQL instance

Finally, we can start deploying a single PostgreSQL instance: 

juju deploy postgresql db1 --channel 14/stable --config profile=testing

Please note that the first time you run the above commands, it might take Juju several minutes to download the required charm, the software operator, and its dependencies.

In order to check the status of the deployment, you can use the following command:

juju status 
# You can type ‘juju status --watch 1s’ to continuously monitor the output

For more details around what is happening in the background you can type:

juju debug-log --tail --replay

After a few minutes,  (your mileage may vary), you should get an output similar to the following after typing juju status:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  08:07:27Z

App  Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
db1  14.11    active      1  postgresql  14/stable  429  no       

Unit    Workload  Agent  Machine  Public address  Ports     Message
db1/0*  active    idle   0        10.234.206.17   5432/tcp  Primary

Machine  State    Address        Inst id        Base          AZ  Message
0        started  10.234.206.17  juju-92c34d-0  ubuntu@22.04      Running

And our first elephant is ready !

Scaling the number of instances

As we said earlier, running a single instance is not a good idea. So we will now explore another neat feature of our charms, on-demand scaling. Adding replicas to PostgreSQL is as simple as running the following command:

juju add-unit -n 2 db1

After some minutes, running  juju status should yield an output similar to the following:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  08:13:19Z

App  Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
db1  14.11    active      3  postgresql  14/stable  429  no       

Unit    Workload  Agent  Machine  Public address  Ports     Message
db1/0*  active    idle   0        10.234.206.17   5432/tcp  Primary
db1/1   active    idle   1        10.234.206.62   5432/tcp  
db1/2   active    idle   2        10.234.206.141  5432/tcp  

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.234.206.17   juju-92c34d-0  ubuntu@22.04      Running
1        started  10.234.206.62   juju-92c34d-1  ubuntu@22.04      Running
2        started  10.234.206.141  juju-92c34d-2  ubuntu@22.04      Running

You can also inspect the containers created by LXD by taping the command:

lxc list

The result should like the following:

+---------------+---------+-----------------------+------+-----------+------+
|     NAME      |  STATE  |         IPV4          | IPV6 |   TYPE    | S... |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-0 | RUNNING | 10.234.206.17 (eth0)  |      | CONTAINER | 0    |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-1 | RUNNING | 10.234.206.62 (eth0)  |      | CONTAINER | 0    |
+---------------+---------+-----------------------+------+-----------+------+
| juju-92c34d-2 | RUNNING | 10.234.206.141 (eth0) |      | CONTAINER | 0    |
+---------------+---------+-----------------------+------+-----------+------+
| juju-a8b5cf-0 | RUNNING | 10.234.206.198 (eth0) |      | CONTAINER | 0    |
+---------------+---------+-----------------------+------+-----------+------+

Deploying our database proxy

Let’s deploy the PgBouncer component next, using the following command:

juju deploy pgbouncer pgb1 --channel 1/stable

After a few minutes, running  juju status should show a new application:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  08:27:13Z

App   Version  Status   Scale  Charm       Channel    Rev  Exposed  Message
db1   14.11    active       3  postgresql  14/stable  429  no       
pgb1           unknown      0  pgbouncer   1/stable   278  no       

Unit    Workload  Agent  Machine  Public address  Ports     Message
db1/0*  active    idle   0        10.234.206.17   5432/tcp  Primary
db1/1   active    idle   1        10.234.206.62   5432/tcp  
db1/2   active    idle   2        10.234.206.141  5432/tcp  

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.234.206.17   juju-92c34d-0  ubuntu@22.04      Running
1        started  10.234.206.62   juju-92c34d-1  ubuntu@22.04      Running
2        started  10.234.206.141  juju-92c34d-2  ubuntu@22.04      Running

As you might have noticed, PgBouncer shows unknown as the Status. You should not worry, as it is expected. PgBouncer is actually a subordinate charm that is deployed inside the same system container. Therefore, PgBouncer will only be invoked when it is used.

Deploying a test application

In order to emulate an application that uses PostgreSQL, we will use the Data Integrator charm:

juju deploy data-integrator di1 --config database-name=testdb

After some minutes, running  juju status should yield an output similar to the following:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  09:08:49Z

App   Version  Status   Scale  Charm            Channel        Rev  Exposed .
db1   14.11    active       3  postgresql       14/stable      429  no       
di1            blocked      1  data-integrator  latest/stable   27  no      .
pgb1           unknown      0  pgbouncer        1/stable       278  no       

Unit    Workload  Agent  Machine  Public address  Ports     Message
db1/0*  active    idle   0        10.234.206.17   5432/tcp  Primary
db1/1   active    idle   1        10.234.206.62   5432/tcp  
db1/2   active    idle   2        10.234.206.141  5432/tcp  
di1/0*  blocked   idle   3        10.234.206.184            Please relate...

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.234.206.17   juju-92c34d-0  ubuntu@22.04      Running
1        started  10.234.206.62   juju-92c34d-1  ubuntu@22.04      Running
2        started  10.234.206.141  juju-92c34d-2  ubuntu@22.04      Running
3        started  10.234.206.184  juju-92c34d-3  ubuntu@22.04      Running

Wiring everything together

Juju provides a powerful integration abstraction (a.k.a. relation) that allows to establish a communication link between two workloads (e.g. PgBouncer and PostgreSQL server).

Therefore, relating PgBouncer to PostgreSQL is as simple as running the following command:

juju relate db1 pgb1

And relating the Data Integrator to PgBouncer is as simple as the previous operation:

juju relate di1 pgb1

After some minutes, running  juju status should yield an output similar to the following:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  09:15:02Z

App   Version  Status  Scale  Charm            Channel        Rev  Exposed  
db1   14.11    active      3  postgresql       14/stable      429  no       
di1            active      1  data-integrator  latest/stable   27  no       
pgb1  1.21.0   active      1  pgbouncer        1/stable       278  no       

Unit       Workload  Agent  Machine  Public address  Ports     Message
db1/0*     active    idle   0        10.234.206.17   5432/tcp  Primary
db1/1      active    idle   1        10.234.206.62   5432/tcp  
db1/2      active    idle   2        10.234.206.141  5432/tcp  
di1/0*     active    idle   3        10.234.206.184            
  pgb1/0*  active    idle            10.234.206.184  6432/tcp  

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.234.206.17   juju-92c34d-0  ubuntu@22.04      Running
1        started  10.234.206.62   juju-92c34d-1  ubuntu@22.04      Running
2        started  10.234.206.141  juju-92c34d-2  ubuntu@22.04      Running
3        started  10.234.206.184  juju-92c34d-3  ubuntu@22.04      Running

And voila ! The communication links are now established between:

  • PgBouncer and PostgreSQL
  • Data Integrator and PgBouncer

Now that we have the full stack ready, let’s play with it.

Time for tasting

Talking to the elephant

We will continue our journey by communicating with the PostgreSQL server. The following command will create a user and display its credentials:

juju run di1/leader get-credentials

The latter will , only one time, display a username and password in the following format:

ok: "True"
postgresql:
  data: '{"database": "testdb", "external-node-connectivity": "true", "requested-secrets":
    "[\"username\", \"password\", \"tls\", \"tls-ca\", \"uris\"]"}'
  database: testdb
  endpoints: 10.234.206.184:6432
  password: <password>
  subordinated: "true"
  username: <username>
  version: "14.11"

We can now connect to the deployed PostgreSQL by using the following commands:

juju ssh db1/0
psql  --host=<ip address of db1/0* that displayed in juju status output>   
      --username=<previously provided user name>
      --password testdb

After typing the previously provided password we get a prompt where we can run queries against our PostgreSQL deployment. For example, we can issue the following query:

test-database=> SELECT VERSION();

You can exit the prompt by typing \q and you can exit the system container by typing exit.

Messing, kindly, with the elephant

High availability is not only about deploying several replicas of PostgreSQL. It is also about providing automatic failover when a problem affects one of the replicas. Let’s check together what our charmed operator will do for us when we  simulate a failure in one of the PostgreSQL units.

We recommend, from now on, to execute the upcoming instructions in a new terminal/tab so that you can follow the behaviour of our charmed operator.

We will simulate a first issue in the PostgreSQL primary by doing the following:

# Connect to the system container where the primary elephant is hosted
juju ssh db1/0 
# The following will display all running processes
ps -edf
# The following will terminate all postgresql related processes
sudo kill -9 <pid of the parent postgres process> 
ps -edf

Patroni should restart all the postgreSQL processes shortly after they are killed. As a consumer of PostgreSQL, you will not notice any issue.

We will now try to abruptly stop the system container where the primary PostgreSQL is running. This can emulate a crash of a server or a sudden network isolation of the primary.

exit
Juju status
# Note the ip address of the primary PostgreSQL
# Then note the Inst id corresponding to the primary (in the Machine section of juju status’ output) 

lxc stop --force --timeout 0 <Inst id>
# The above will abruptly stop the Primary’s host

lxc list

By checking juju status, you can see that an automatic failover happened and that our cluster self healed after only a few seconds! 

A juju status should present an output similar to the following:

Model    Controller     Cloud/Region         Version  SLA          Timestamp
mymodel  demo-postgres  localhost/localhost  3.4.5    unsupported  09:47:52Z

App   Version  Status  Scale  Charm            Channel        Rev  Exposed  
db1   14.11    active    2/3  postgresql       14/stable      429  no       
di1            active      1  data-integrator  latest/stable   27  no       
pgb1  1.21.0   active      1  pgbouncer        1/stable       278  no       

Unit       Workload  Agent  Machine  Public address  Ports     Message
db1/0      unknown   lost   0        10.234.206.17   5432/tcp  agent lost... 
db1/1      active    idle   1        10.234.206.62   5432/tcp  Primary
db1/2*     active    idle   2        10.234.206.141  5432/tcp  
di1/0*     active    idle   3        10.234.206.184            
  pgb1/0*  active    idle            10.234.206.184  6432/tcp  

Machine  State    Address         Inst id        Base          AZ  Message
0        down     10.234.206.17   juju-92c34d-0  ubuntu@22.04      Running
1        started  10.234.206.62   juju-92c34d-1  ubuntu@22.04      Running
2        started  10.234.206.141  juju-92c34d-2  ubuntu@22.04      Running
3        started  10.234.206.184  juju-92c34d-3  ubuntu@22.04      Running

Contact us for more

Canonical can help you secure and automate the deployment, maintenance and upgrades of your PostgreSQL databases across private and public clouds. Let us know more about your use cases and needs by contacting our experts.

Talk to us today

Interested in running Ubuntu in your organisation?

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

How to secure your database

Cybersecurity threats are increasing in volume, complexity and impact. Yet, organisations struggle to counter these growing threats. Cyber attacks often...

Should you use open-source databases?

You are not the only one asking this seemingly popular question! Several companies are torn between the rise in appeal of open-source databases and the...

Patterns to achieve database High Availability

The cost of database downtime A study from ManageForce estimated the cost of a database outage to be an average of $474,000 per hour. Long database outages...