LXD Clusters: A Primer

This article was last updated 5 years ago.


Since its inception, LXD has been striving to offer a fresh and intuitive user experience for machine containers. LXD instances can be managed over the network through a REST API and a single command line tool. For large scale LXD deployments, OpenStack has been the standard approach: using Nova LXD, lightweight containers replace traditional hypervisors like KVM, enabling bare metal performance and very high workload density. Of course OpenStack itself offers a very wide spectrum of functionality, and it demands resources and expertise. So today, if you are looking for a simple and comprehensive way to manage LXD across multiple hosts, without adopting an Infrastructure as a Service platform, you are in for a treat.

LXD 3.0 introduces native support for LXD clusters. Effectively, LXD now crosses server and VM boundaries, enabling the management of instances uniformly using the lxc client or the REST API. In order to achieve high availability for its control plane, LXD implements fault tolerance for the shared state utilizing the Raft algorithm. Clustering allows us to combine LXD with low level components, like heterogenous bare-metal and virtualized compute resources, shared scale-out storage pools and overlay networking, building specialized infrastructure on demand. Whether optimizing for an edge environment, an HPC cluster or lightweight public cloud abstraction, clustering plays a key role. Let’s quickly design and build a small cluster and see how it works.

There are three main dimensions we need to consider for our LXD cluster:

  1. The number of compute node
  2. The type and quantity of available storage
  3. The container networking.

A minimalistic cluster necessitates at least three host nodes. We may choose to use bare-metal servers or virtual machines as hosts. In the latter case, it would be beneficial for the VMs to reside on three different hypervisors for better fault tolerance. For storage, LXD has a powerful driver back-end enabling it to manage multiple storage pools both host-local (zfs, lvm, dir, btrfs) and shared (ceph). VXLAN-based overlay networking as well as “flat” bridged/macvlan networks with native VLAN segmentation are supported. It’s important to note that the decisions for storage and networking affect all nodes joining the cluster and thus need to be homogenous.

For this walkthrough, we are using MAAS 2.3.1 and we are carving out 3 VMs from KVM Pods with two local storage volumes, (1) 8GB for the root filesystem (2) 6GB for the LXD storage pool. We configure each VM with a bridged interface (br0) and “Auto assign” IP mode.

We deploy Ubuntu 16.04.3 on all the VMs and we are now ready to bootstrap our LXD cluster. We will remove the LXD 2.x packages that come by default with Xenial, install ZFS for our storage pools, install the latest LXD 3.0 from snaps and go through the interactive LXD initialization process.

ubuntu@a0-dc1-001:~$ sudo apt-get purge lxd lxd-client -y
ubuntu@a0-dc1-001:~$ sudo apt-get install zfsutils-linux -y
ubuntu@a0-dc1-001:~$ sudo snap install lxd
ubuntu@a0-dc1-001:~$ lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=a0-dc1-001]:
What IP address or DNS name should be used to reach this node? [default=10.0.0.74]:
Are you joining an existing cluster? (yes/no) [default=no]:
Setup password authentication on the cluster? (yes/no) [default=yes]:
Trust password for new clients:
Again:
Do you want to configure a new local storage pool? (yes/no) [default=yes]:
Name of the storage backend to use (btrfs, dir, lvm, zfs) [default=zfs]:
Create a new ZFS pool? (yes/no) [default=yes]:
Would you like to use an existing block device? (yes/no) [default=no]: yes
Path to the existing block device: /dev/vdb
Do you want to configure a new remote storage pool? (yes/no) [default=no]:
Would you like to connect to a MAAS server? (yes/no) [default=no]: yes
What's the name of this host in MAAS? [default=a0-dc1-001]:
URL of your MAAS server (e.g. http://1.2.3.4:5240/MAAS): http://192.168.1.71:5240/MAAS
API key for your MAAS server: rygVZup7mbmtJhWh3V:vwEffNYyrf4whrmb3U:AyUvwtMY7wtUgRn9tN7wrrnpyQ6tq7d
Would you like to create a new network bridge? (yes/no) [default=yes]: no
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Name of the existing bridge or host interface: br0
Is this interface connected to your MAAS server? (yes/no) [default=yes]: 
MAAS IPv4 subnet name for this interface (empty for no subnet): 10.0.0.0/24
MAAS IPv6 subnet name for this interface (empty for no subnet): 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
  core.https_address: 10.0.0.74:8443
  core.trust_password: cluster
  maas.api.key: rygVZup7mbmtJhWh3V:vwEffNYyrf4whrmb3U:AyUvwtMY7wtUgRn9tN7wrrnpyQ6tq7d
  maas.api.url: http://192.168.1.71:5240/MAAS
cluster:
  server_name: a0-dc1-001
  enabled: true
  cluster_address: ""
  cluster_certificate: ""
  cluster_password: ""
networks: []
storage_pools:
- config:
    source: /dev/vdb
  description: ""
  name: local
  driver: zfs
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      maas.subnet.ipv4: 10.0.0.0/24
      name: eth0
      nictype: bridged
      parent: br0
      type: nic
    root:
      path: /
      pool: local
      type: disk
  name: default

The last step of the initialization allows us to produce a preseed file that can be used for future, automated bootstraping. As we will see later, to join subsequent nodes we will need to use a modified preseed file. The cluster is identified by a unique fingerprint, which can be retrieved by:

$ lxc info  | grep certificate_fingerprint
 certificate_fingerprint: 7041ee3446493744092424409a54767b4c8458e4c0f4d4d3742347f4f74dc4ba

Now we need to join the other two nodes to the cluster. For each one of them, do the following:

$ sudo apt-get purge lxd lxd-client -y 
$ sudo apt-get install zfsutils-linux -y
$ sudo snap install lxd
$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=a0-dc1-002]: 
What IP address or DNS name should be used to reach this node? [default=fe80::5054:ff:fe9a:1d78]: 10.0.0.76
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.0.0.74
Cluster fingerprint: 7041ee3446493744092424409a54767b4c8458e4c0f4d4d3742347f4f74dc4ba
You can validate this fingerpring by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose the local disk or dataset for storage pool "local" (empty for loop disk): /dev/vdb
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
  core.https_address: 10.0.0.76:8443
cluster:
  server_name: a0-dc1-002
  enabled: true
  cluster_address: 10.0.0.74:8443
  cluster_certificate: |
    -----BEGIN CERTIFICATE-----
    MIIFTDCCAzSgAwIBA[...]U6Qw==
    -----END CERTIFICATE-----
  cluster_password: cluster
networks: []
storage_pools:
- config:
    source: /dev/vdb
  description: ""
  name: local
  driver: zfs
profiles:
- config: {}
  description: ""
  devices: {}
  name: default

Our second node is ready! We also have a new preseed file that can be used to automate joining new nodes. Of course the following parameters will need to be adapted on a per node basis: core.https_address, server_name. The cluster is not fully formed yet (we need to setup a third node to reach quorum) but we can review the status of storage and network:

ubuntu@a0-dc1-002:~$ lxc network list
+------+----------+---------+-------------+---------+-------+
| NAME |   TYPE   | MANAGED | DESCRIPTION | USED BY | STATE |
+------+----------+---------+-------------+---------+-------+
| br0  | bridge   | NO      |             | 0       |       |
+------+----------+---------+-------------+---------+-------+
| eth0 | physical | NO      |             | 0       |       |
+------+----------+---------+-------------+---------+-------+

ubuntu@a0-dc1-002:~$ lxc storage list
+-------+-------------+--------+---------+---------+
| NAME  | DESCRIPTION | DRIVER |  STATE  | USED BY |
+-------+-------------+--------+---------+---------+
| local |             | zfs    | CREATED | 1       |
+-------+-------------+--------+---------+---------+

After we repeat the previous configuration process for the third node, we query the cluster’s state:

ubuntu@a0-dc1-003:~$ lxc cluster list
+------------+------------------------+----------+--------+-------------------+
|    NAME    |          URL           | DATABASE | STATE  |      MESSAGE      |
+------------+------------------------+----------+--------+-------------------+
| a0-dc1-001 | https://10.0.0.74:8443 | YES      | ONLINE | fully operational |
+------------+------------------------+----------+--------+-------------------+
| a0-dc1-002 | https://10.0.0.76:8443 | YES      | ONLINE | fully operational |
+------------+------------------------+----------+--------+-------------------+
| a0-dc1-003 | https://10.0.0.80:8443 | YES      | ONLINE | fully operational |
+------------+------------------------+----------+--------+-------------------+

If we need to remove any nodes from the cluster (ensuring first that there are at least 3 active node at any time) we can simply do:

$ lxc cluster remove [node name]

Unless the node is unavailable and cannot be removed, in which case we need to force removal:

$ lxc cluster remove --force [node name]

When launching a new container, LXD automatically selects a host/node from the entire cluster, providing auto-loadbalancing. Let’s launch a few containers:

ubuntu@a0-dc1-001:~$ for i in $(seq 1 3); do lxc launch ubuntu:x "c${i}"; done
Creating c1
Starting c1
Creating c2
Starting c2
Creating c3
Starting c3

ubuntu@a0-dc1-003:~$ lxc list
+------+---------+------------------+------+------------+-----------+------------+
| NAME |  STATE  |       IPV4       | IPV6 |    TYPE    | SNAPSHOTS |  LOCATION  |
+------+---------+------------------+------+------------+-----------+------------+
| c1   | RUNNING | 10.0.0.91 (eth0) |      | PERSISTENT | 0         | a0-dc1-001 |
+------+---------+------------------+------+------------+-----------+------------+
| c2   | RUNNING | 10.0.0.92 (eth0) |      | PERSISTENT | 0         | a0-dc1-002 |
+------+---------+------------------+------+------------+-----------+------------+
| c3   | RUNNING | 10.0.0.93 (eth0) |      | PERSISTENT | 0         | a0-dc1-003 |
+------+---------+------------------+------+------------+-----------+------------+

LXD has spread the three containers on the three different hosts. And because we have integrated LXD with MAAS and we are using MAAS for DNS, we get automated DNS updates:

ubuntu@a0-dc1-003:~$ host c1
c1.maas has address 10.0.0.91

The auto-placement of the container can be overridden, by providing the target host:

ubuntu@a0-dc1-002:~$ lxc launch --target a0-dc1-003 ubuntu:x c4-on-host-003
Creating c4-on-host-003
Starting c4-on-host-003

ubuntu@a0-dc1-002:~$ lxc list
+----------------+---------+------------------+------+------------+-----------+------------+
|      NAME      |  STATE  |       IPV4       | IPV6 |    TYPE    | SNAPSHOTS |  LOCATION  |
+----------------+---------+------------------+------+------------+-----------+------------+
| c1             | RUNNING | 10.0.0.91 (eth0) |      | PERSISTENT | 0         | a0-dc1-001 |
+----------------+---------+------------------+------+------------+-----------+------------+
| c2             | RUNNING | 10.0.0.92 (eth0) |      | PERSISTENT | 0         | a0-dc1-002 |
+----------------+---------+------------------+------+------------+-----------+------------+
| c3             | RUNNING | 10.0.0.93 (eth0) |      | PERSISTENT | 0         | a0-dc1-003 |
+----------------+---------+------------------+------+------------+-----------+------------+
| c4-on-host-003 | RUNNING | 10.0.0.94 (eth0) |      | PERSISTENT | 0         | a0-dc1-003 |
+----------------+---------+------------------+------+------------+-----------+------------+

A launched/running container can be operated/accessed from any node:

ubuntu@a0-dc1-002:~$ lxc exec c1 -- ip -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
4: eth0    inet 10.0.0.91/24 brd 10.0.0.255 scope global eth0\       valid_lft forever preferred_lft forever
4: eth0    inet6 fe80::216:3eff:fec1:8626/64 scope link \       valid_lft forever preferred_lft forever

LXD clustering enables effortless management of machine containers, scaling linearly on top of any substrate (bare-metal, virtualized, private and public cloud) allowing easy workload mobility and simplified operations. We have taken the first steps to explore this new powerful feature. Try it and join the vibrant community.

Ubuntu cloud

Ubuntu offers all the training, software infrastructure, tools, services and support you need for your public and private clouds.

Newsletter signup

Get the latest Ubuntu news and updates in your inbox.

By submitting this form, I confirm that I have read and agree to Canonical's Privacy Policy.

Related posts

Migrating from CentOS to Ubuntu: a guide for system administrators and DevOps

CentOS 7 is on track to reach its end-of-life (EoL) on June 30, 2024. Post this date, the CentOS Project will cease to provide updates or support, including...

Implementing an Android™ based cloud game streaming service with Anbox Cloud

Since the outset, Anbox Cloud was developed with a variety of use cases for running Android at scale. Cloud gaming, more specifically for casual games as...

A call for community

Introduction Open source projects are a testament to the possibilities of collective action. From small libraries to large-scale systems, these projects rely...