CrateDB Cloud Stack

Metric

Target

RPO

0 (No Data kept in cloud stack itself).

RTO

1 hour

Recovery of CrateDB Cloud apps depends on the k8s cluster that you need to be recovered. The main “production” cluster (for https://console.cratedb.cloud) is aks1.westeurope.azure. This cluster has the following applications deployed:

  1. cloud - basic setup for the cloud stack (namespaces, some secrets, …).

  2. cloud-agent - an API receiving commands to perform operations on clusters and projects.

  3. cloud-api - the main API.

  4. cloud-gateway - The oAuth2 gateway that is used by ingress-nginx to perform auth.

  5. cloud-k8s-operator - The k8s operator responsible for CrateDB Cloud Projects.

  6. cloud-telemetry - Prometheus and Alertmanager.

  7. cloud-ui - self-explanatory

  8. crate-operator - The k8s operator responsible for CrateDB Cloud Clusters.

Other clusters (i.e. aks1.eastus2.azure, eks1.eu-west-1.aws) contain the following:

  1. cloud

  2. cloud-k8s-operator

  3. cloud-telemetry

  4. crate-operator

If any of these services is lost, recovering them is a matter of re-deploying the right version from the corresponding Jenkins jobs for each app.

cloud must be deployed first, followed by other apps in any sequence.

Which versions to deploy?

If an app has been lost completely (for some reason), or the cluster is being re-deployed from scratch, it might not be clear what version needs to be deployed. Fortunately, deployments always happen shortly after a tag has been created, so one can safely assume that deploying the latest tag in Git is safe.