Deleted Clusters¶
Metric |
Target |
|---|---|
RPO |
1.5 hours. Backups happen every hour and can take tens of minutes to complete. |
RTO |
4 hours |
Accidental cluster deletion via API¶
Either a Crate superuser or the customer themselves have accidentally delete a cluster that should not be deleted. Since the cluster was removed completely (including the backups), a special procedure will need to be followed to perform the restore.
Support article: https://github.com/crate/support/issues/283.
Accidental cluster deletion in Kubernetes¶
A cluster was deleted directly in Kubernetes. This is different from when a cluster is deleted via the API, as the cluster is still present in brain and the backups have not been removed from s3.
If the CrateDB CRD has not been lost:¶
Support article: https://github.com/crate/support/issues/271.
If the Project/CrateDB CRD has been lost¶
We do backup projects and cratedbs on all of our production clusters regularly. The manifests and partially also the secrets, are commited to the crate Github repository holzhammer. holzhammer also backs up some of cloud namespaces eg. cloud-app, templates.
As soon as crate-operator, k8s-operator, replicator are running the projects/cratedbs can be applied for the purpose of desaster recovery. Mileage varies depending on the D/R scenario you are seeing.
Support article: https://github.com/crate/support/issues/285.
A special note about brain¶
brain is a special case because of the chicken & egg problem. The information about
this cluster is also held in brain itself, so it’s not possible to easily re-create
the CrateDB CRD for it as mentioned in the instructions above.
To avoid a headache, the brain CRD is listed below, but also saved with the current
settings at https://github.com/crate/holzhammer/blob/master/manifests/k8s.aks1.westeurope.azure/2cba0c6b-c390-443b-b007-9137a55018b6/cratedb-3b511b64-07c5-4926-86ee-99fbcf4eda15.yaml:
apiVersion: cloud.crate.io/v1
kind: CrateDB
metadata:
labels:
cloud.crate.io/channel: stable
cloud.crate.io/organization-id: 21095b48-a276-46a7-95fd-64479f4d409d
cloud.crate.io/organization-name: crate-io
cloud.crate.io/project-id: 2cba0c6b-c390-443b-b007-9137a55018b6
cloud.crate.io/project-name: brain-production
name: 3b511b64-07c5-4926-86ee-99fbcf4eda15
namespace: 2cba0c6b-c390-443b-b007-9137a55018b6
spec:
backups:
aws:
accessKeyId:
secretKeyRef:
key: access-key-id
name: aws-backup-credentials
basePath: 2cba0c6b-c390-443b-b007-9137a55018b6
bucket:
secretKeyRef:
key: bucket
name: aws-backup-credentials
cron: 17 * * * *
region:
secretKeyRef:
key: region
name: aws-backup-credentials
secretAccessKey:
secretKeyRef:
key: secret-access-key
name: aws-backup-credentials
cluster:
externalDNS: brainprod.aks1.westeurope.azure.cratedb.net.
imageRegistry: crate
license:
secretKeyRef:
key: license-key
name: cratedb-license
name: brainprod
settings:
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: 1,2,3
ssl:
keystore:
secretKeyRef:
key: keystore.jks
name: keystore-letsencrypt-3b511b64-07c5-4926-86ee-99fbcf4eda15
keystoreKeyPassword:
secretKeyRef:
key: keystore-key-password
name: keystore-passwords
keystorePassword:
secretKeyRef:
key: keystore-password
name: keystore-passwords
version: 4.6.4
nodes:
data:
- name: hot
replicas: 3
resources:
cpus: 2
disk:
count: 1
size: "237000000000"
storageClass: crate-premium
heapRatio: 0.25
memory: "16106127360"
users:
- name: prod
password:
secretKeyRef:
key: password
name: user-password-3b511b64-07c5-4926-86ee-99fbcf4eda15-0
And the project:
apiVersion: cloud.crate.io/v1
kind: Project
metadata:
name: 2cba0c6b-c390-443b-b007-9137a55018b6
spec:
credentials:
aws:
accessKeyId: [CREATE A NEW ONE]
bucket: cratedb-backup-westeurope-azure
region: eu-west-1
secretAccessKey: [CREATE A NEW ONE]
organization:
id: 21095b48-a276-46a7-95fd-64479f4d409d
name: crate-io
project:
id: 2cba0c6b-c390-443b-b007-9137a55018b6
name: brain-production