Backup and restore Cassandra data
Medusa is a Cassandra backup and restore tool. It’s packaged with K8ssandra Operator and supports a variety of backends.
These instructions use a local
minio bucket as an example.
Supported object storage types for backups
Supported in K8ssandra Operator’s Medusa:
You can deploy Medusa on all Cassandra datacenters in the cluster through the addition of settings in the
K8ssandraCluster definition. Example:
apiVersion: k8ssandra.io/v1alpha1 kind: K8ssandraCluster metadata: name: demo spec: cassandra: ... ... medusa: storageProperties: # Can be either of local, google_storage, azure_blobs, s3, s3_compatible, s3_rgw or ibm_storage storageProvider: s3_compatible # Name of the secret containing the credentials file to access the backup storage backend storageSecretRef: name: medusa-bucket-key # Name of the storage bucket bucketName: k8ssandra-medusa # Prefix for this cluster in the storage bucket directory structure, used for multitenancy prefix: test # Host to connect to the storage backend (Omitted for GCS, S3, Azure and local). host: minio.minio.svc.cluster.local # Port to connect to the storage backend (Omitted for GCS, S3, Azure and local). port: 9000 # Region of the storage bucket # region: us-east-1 # Whether or not to use SSL to connect to the storage backend secure: false # Maximum backup age that the purge process should observe. # 0 equals unlimited # maxBackupAge: 0 # Maximum number of backups to keep (used by the purge process). # 0 equals unlimited # maxBackupCount: 0 # AWS Profile to use for authentication. # apiProfile: # transferMaxBandwidth: 50MB/s # Number of concurrent uploads. # Helps maximizing the speed of uploads but puts more pressure on the network. # Defaults to 1. # concurrentTransfers: 1 # File size in bytes over which cloud specific cli tools are used for transfer. # Defaults to 100 MB. # multiPartUploadThreshold: 104857600 # Age after which orphan sstables can be deleted from the storage backend. # Protects from race conditions between purge and ongoing backups. # Defaults to 10 days. # backupGracePeriodInDays: 10 # Pod storage settings to use for local storage (testing only) # podStorage: # storageClassName: standard # accessModes: # - ReadWriteOnce # size: 100Mi
The definition above requires a
medusa-bucket-key to be created in the target namespace before the
K8ssandraCluster object gets created. Use the following format for this secret:
apiVersion: v1 kind: Secret metadata: name: medusa-bucket-key type: Opaque stringData: # Note that this currently has to be set to credentials! credentials: |- [default] aws_access_key_id = minio_key aws_secret_access_key = minio_secret
The file should always specify
credentials as shown in the example above; in that section, provide the expected format and credential values that are expected by Medusa for the chosen storage backend. For more, refer to the Medusa documentation to know which file format should used for each supported storage backend.
A successful deployment should inject a new init container named
medusa-restore and a new container named
medusa in the Cassandra STS pods.
Creating a Backup
To perform a backup of a Cassandra datacenter, create the following custom resource in the namespace where K8ssandra was deployed:
apiVersion: medusa.k8ssandra.io/v1alpha1 kind: CassandraBackup metadata: name: medusa-backup1 spec: cassandraDatacenter: dc1 name: medusa-backup1
metadata.name value can match the
spec.name value for convenience, but it is not mandatory. The latter will be used to identify the backup in the storage backend, the former being the name of the
CassandraBackup custom resource in Kubernetes.
Checking Backup Completion
K8ssandra Operator will detect the
CassandraBackup object creation and trigger a backup asynchronously.
To monitor the backup completion, check if the
finishTime value isn’t empty in the CassandraBackup object status. Example:
% kubectl get cassandrabackup/medusa-backup1 -o yaml kind: CassandraBackup metadata: name: medusa-backup1 spec: backupType: differential cassandraDatacenter: dc1 name: medusa-backup1 status: ... ... finishTime: "2022-01-06T16:34:35Z" finished: - demo-dc1-default-sts-0 - demo-dc1-default-sts-1 - demo-dc1-default-sts-2 startTime: "2022-01-06T16:34:30Z"
All pods having completed the backup will be in the
Restoring a Backup
To restore an existing backup for a Cassandra datacenter, create the following custom resource in the namespace where K8ssandra was deployed. Example:
apiVersion: medusa.k8ssandra.io/v1alpha1 kind: CassandraRestore metadata: name: restore-backup1 namespace: k8ssandra-operator spec: cassandraDatacenter: name: dc1 clusterName: demo backup: medusa-backup1 inPlace: true shutdown: true
spec.backup value should match the CassandraBackup
Once the K8ssandra Operator detects on the
CassandraRestore object creation, it will control the shutdown of all Cassandra pods, and the
medusa-restore container will perform the actual data restore upon pod restart.
Checking Restore Completion
To monitor the restore completion, check if the
finishTime value isn’t empty in the
CassandraRestore object status. Example:
% kubectl get cassandrarestore/restore-backup1 -o yaml apiVersion: medusa.k8ssandra.io/v1alpha1 kind: CassandraRestore metadata: name: restore-backup1 spec: backup: medusa-backup1 cassandraDatacenter: clusterName: demo name: dc1 inPlace: true shutdown: true status: datacenterStopped: "2022-01-06T16:45:09Z" finishTime: "2022-01-06T16:48:23Z" restoreKey: ec5b35c1-f2fe-4465-a74f-e29aa1d467ff startTime: "2022-01-06T16:44:53Z"
See the following Custom Resource Definition (CRD) reference topics:
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.