What is Ceph?

Ceph is an open source software-defined storage solution designed to address the block, file and object storage needs of modern enterprises. Its highly scalable architecture sees it being adopted as the new norm for high-growth block storage, object stores, and data lakes. Ceph provides reliable and scalable storage while keeping CAPEX and OPEX costs in line with underlying commodity hardware prices.


Production-worthy Ceph storage

Ceph makes it possible to decouple data from physical storage hardware using software abstraction layers, which provides unparalleled scaling and fault management capabilities. This makes Ceph ideal for cloud, Openstack, Kubernetes, and other microservice and container-based workloads, as it can effectively address large data volume storage needs.

The main advantage of Ceph is that it provides interfaces for multiple storage types within a single cluster, eliminating the need for multiple storage solutions or any specialised hardware, thus reducing management overheads. Use cases for Ceph range from private cloud infrastructure (both hyper-converged and disaggregated) to big data analytics and rich media, or as an alternative to public cloud storage.


What is a Ceph cluster?

A Ceph storage cluster consists of the following types of daemons:

  • Cluster monitors (ceph-mon) that maintain the map of the cluster state, keeping track of active and failed cluster nodes, cluster configuration, and information about data placement and manage authentication.
  • Managers (ceph-mgr) that maintain cluster runtime metrics, enable dashboarding capabilities, and provide an interface to external monitoring systems.
  • Object storage devices (ceph-osd) that store data in the Ceph cluster and handle data replication, erasure coding, recovery, and rebalancing. Conceptually, an OSD can be thought of as a slice of CPU/RAM and the underlying SSD or HDD.
  • Rados Gateways (ceph-rgw) that provide object storage APIs (swift and S3) via http/https.
  • Metadata servers (ceph-mds) that store metadata for the Ceph File System, mapping filenames and directories of the file system to RADOS objects and enabling the use of POSIX semantics to access the files.
  • iSCSI Gateways (ceph-iscsi) that provide iSCSI targets for traditional block storage workloads such as VMware or Windows Server.

Ceph stores data as objects within logical storage pools. A Ceph cluster can have multiple pools, each tuned to different performance or capacity use cases. In order to efficiently scale and handle rebalancing and recovery, Ceph shards the pools into placement groups (PGs). The CRUSH algorithm defines the placement group for storing an object and thereafter calculates which Ceph OSDs should store the placement group.


Ceph features

  • Thin provisioning of block storage for disk usage optimisation
  • Partial or complete read and writes and atomic transactions
  • Replication and erasure coding for data protection
  • Snapshot history, cloning and layering support
  • POSIX file system semantics support
  • Object level key-value mappings
  • Swift and AWS S3 Object API Compatibility

Companies using Ceph

There are multiple users of Ceph across a broad range of industries, from academia to telecommunications and cloud service providers. Ceph is particularly favored for its flexibility, scalability, and robustness.


Notable Ceph Users


Community and governance

Ceph was initially created by Sage Weil as part of his doctoral dissertation at the University of California, Santa Cruz and evolved from a file system prototype to a fully functional open source storage platform.

Ubuntu was an early supporter of Ceph and its community. That support continues today as Canonical maintains premier member status and serves on the governing board of the Ceph Foundation.

Multiple companies contribute to Ceph, with many more playing a part in the broader community.


Influential contributors to Ceph