Headline

CCEP

3

Title

Using and encoding IDs

Version

1

Author

Markus Holtermann

Date

2019-03-11

Status

Rejected

Important

This CCEP was rejected!

While there are no queries in cloud-app that rely on e.g. the cluster ID, there are parts in cloud-fe that rely on the first eight characters of the string or hex encoding of a cluster ID.

Additionally, when one wants to query the telemetry database manually and look for logs or metrics for a specific cluster, using the base64 encoded form will cause additional mental burden and overhead.

Introduction

There are loads of items in CrateDB Cloud that need to be identified. To do so, UUIDs following RFC 4122 are used. These UUIDs are also known as UUID version 4.

This CCEP is about the how UUIDs should be encoded and used throughout the CrateDB Cloud product.

Proposal

Encoding formats

One can create a UUID 4 in Python using this code snippet:

>>> import uuid
>>> uuid.uuid4()
UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')

Given this UUID, there are 3 encoding forms used throughout CrateDB Cloud:

  1. The string encoding:

    >>> str(uuid.uuid4())
    '7570842a-fbec-40d3-ae25-3f1c2bda326b'
    
  2. The hex encoding:

    >>> uuid.uuid4().hex
    '7570842afbec40d3ae253f1c2bda326b'
    
  3. The base64 encoding:

    >>> import base64
    >>> base64.urlsafe_b64encode(uuid.uuid4().bytes).decode().rstrip("=")
    'dXCEKvvsQNOuJT8cK9oyaw'
    

Similarly, decoding of these forms works like this:

  1. The string decoding:

    >>> uuid.UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')
    UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')
    
  2. The hex decoding:

    >>> uuid.UUID('7570842afbec40d3ae253f1c2bda326b')
    UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')
    
  3. The base64 decoding:

    >>> uuid.UUID(bytes=base64.urlsafe_b64decode("dXCEKvvsQNOuJT8cK9oyaw" + "=="))
    UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')
    

While encoding and decoding of the base64 form is more complex, it reduced the encoded form to 22 characters instead of at 32 (hex encoding) or 36 (string encoding).

Where to use

Generally, UUIDs must be used as primary identifier/primary key for each object in the main application database, Brain.

Note

The only objects at this time not using UUIDs are Roles, some telemetry related models and some many-to-many intermediate objects. However, for roles there’s an idea to remove that database model and rely on the Python enum.Enum only. Many-to-many objects define their primary key as a combination of the referenced objects. Similarly for telemetry related objects, that define multi-column primary keys that include UUIDs of related objects.

Which encoding to use

Given the 3 encodings outlined above, here’s how and when each should be used:

  • The primary key of an object in Brain should always be stored in the string encoding. The same applies for foreign keys.

  • Kubernetes pod names, statefulset names, deployment names, etc. should use the base64 encoding. The ID will end up in the file name of log files written on the Kubernetes hosts. From there a pod’s name and with that the base64 encoded primary key will end up in telemetry database records for logs and metrics.

  • The cratedb.cloud/resource-id label should always use the hex encoding.