Replacing OSD disks

The procedural steps given in this guide will show how to recreate a Ceph OSD disk within a Charmed Ceph deployment. It does so via a combination of the remove-disk and add-disk actions, while preserving the OSD Id. This is typically done because operators become accustomed to certain OSD’s having specific roles.

Note:

This method makes use of the ceph-osd charm’s remove-disk action, which appeared in the charm’s quincy/stable channel. There is a pre-Quincy version of this page available.

Identifying the target OSD

We’ll check the OSD tree to map OSDs to their host machines:

juju ssh ceph-mon/leader sudo ceph osd tree

Sample output:

ID   CLASS  WEIGHT   TYPE NAME               STATUS     REWEIGHT  PRI-AFF
 -1         0.11198  root default                                        
 -7         0.00980      host direct-ghost                               
  4    hdd  0.00980          osd.4                  up   1.00000  1.00000
 -9         0.00980      host famous-cattle                              
  3    hdd  0.00980          osd.3                  up   1.00000  1.00000
  5    ssd  0.00980          osd.5                  up   1.00000  1.00000
 -5         0.07280      host osd-01                                     
  0    hdd  0.07280          osd.0                  up   1.00000  1.00000
 -3         0.00980      host sure-tarpon                                
  1    hdd  0.00980          osd.1                  up   1.00000  1.00000
-11         0.00980      host valued-fly                                 
  2    hdd  0.00980          osd.2                  up   1.00000  1.00000

Thus, let’s assume that we want to replace osd.5. As shown in the output, it’s hosted on the machine famous-cattle.

So now, we check which unit is deployed on that machine:

juju status

Sample output:

Unit         Workload  Agent  Machine  Public address  Ports  Message
...
ceph-osd/1   active    idle   4        192.168.122.8          Unit is ready (2 OSD)
...

Machine  State    Address         Inst id        Series  AZ       Message
...
4        started  192.168.122.8   famous-cattle  jammy   default  Deployed
...

In this case, ceph-osd/1 is the target unit.

Therefore, the target OSD can be identified by the following properties:

OSD_UNIT=ceph-osd/1
OSD=osd.5
OSD_ID=5

Replacing the disk

We’ll start by removing the disk. The command to run is the following:

juju run-action $OSD_UNIT --wait remove-disk osd-ids=$OSD

If successful, the output should contain the following:

1 disk(s) was removed
To replace them, run:
juju run-action ceph-osd/1 add-disk osd-devices=/dev/vdb osd-ids=osd.5

This includes the instructions on how to replace the disk. The important bit is that the OSD Id can be recycled since we didn’t use the purge flag during removal.

Now, let’s assume that the reason we want to replace the disk was to include a bcache device to make things faster. We can do this easily with the add-disk action.

For example, if the caching device is /dev/pmem0, and the backing device is the kept (i.e: /dev/vdb), we can identify the following properties:

OSD_CACHE_DEVICE=/dev/pmem0
OSD_BACKING_DEVICE=/dev/vdb

Thus, we can run the following to finally replace the disk:

juju run-action --wait $OSD_UNIT add-disk osd-devices=$OSD_BACKING_DEVICE cache-devices=$OSD_CACHE_DEVICE osd-ids=$OSD

This will create a new disk with a bcache device using a caching and backing device and will reuse the OSD Id of the original disk.

We can check that this is the case by running the following:

juju ssh $OSD_UNIT -- sudo ceph-volume lvm list

And checking that the output contains:

====== osd.5 =======

  [block]       /dev/ceph-55245f64-30ab-4544-88d7-dbdfb88521c3/osd-block-55245f64-30ab-4544-88d7-dbdfb88521c3

      block device              /dev/ceph-55245f64-30ab-4544-88d7-dbdfb88521c3/osd-block-55245f64-30ab-4544-88d7-dbdfb88521c3
      block uuid                i4vyZI-7m5D-36kp-Duy9-kx52-0UvG-Hz16SG
      cephx lockbox secret      
      cluster fsid              35a48462-7125-11ed-8c2f-e78ff76f8d8b
      cluster name              ceph
      crush device class        
      encrypted                 0
      osd fsid                  55245f64-30ab-4544-88d7-dbdfb88521c3
      osd id                    3
      osdspec affinity          
      type                      block
      vdo                       0
      devices                   /dev/bcache0

In other words, the OSD Id is the same (osd.5) and the device is using bcache.

We can further check that the cluster shows the right number of OSD’s by running:

juju status

And checking that the unit ceph-osd/1 shows 2 OSD’s.

This page was last modified 4 months ago. Help improve this document in the forum.