======== Headline ======== +-------------+------------------------------------------------+ | CCEP | 3 | +-------------+------------------------------------------------+ | Title | Using and encoding IDs | +-------------+------------------------------------------------+ | Version | 1 | +-------------+------------------------------------------------+ | Author | Markus Holtermann | +-------------+------------------------------------------------+ | Date | 2019-03-11 | +-------------+------------------------------------------------+ | Status | Rejected | +-------------+------------------------------------------------+ .. important:: This CCEP was rejected! While there are no queries in ``cloud-app`` that rely on e.g. the cluster ID, there *are* parts in ``cloud-fe`` that rely on the first eight characters of the *string* or *hex* encoding of a cluster ID. Additionally, when one wants to query the telemetry database manually and look for logs or metrics for a specific cluster, using the base64 encoded form will cause additional mental burden and overhead. Introduction ============ There are loads of items in CrateDB Cloud that need to be identified. To do so, UUIDs following :rfc:`4122` are used. These UUIDs are also known as UUID version 4. This CCEP is about the how UUIDs should be encoded and used throughout the CrateDB Cloud product. Proposal ======== Encoding formats ---------------- One can create a UUID 4 in Python using this code snippet:: >>> import uuid >>> uuid.uuid4() UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b') Given this UUID, there are 3 encoding forms used throughout CrateDB Cloud: #. The *string* encoding:: >>> str(uuid.uuid4()) '7570842a-fbec-40d3-ae25-3f1c2bda326b' #. The *hex* encoding:: >>> uuid.uuid4().hex '7570842afbec40d3ae253f1c2bda326b' #. The *base64* encoding:: >>> import base64 >>> base64.urlsafe_b64encode(uuid.uuid4().bytes).decode().rstrip("=") 'dXCEKvvsQNOuJT8cK9oyaw' Similarly, decoding of these forms works like this: #. The *string* decoding:: >>> uuid.UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b') UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b') #. The *hex* decoding:: >>> uuid.UUID('7570842afbec40d3ae253f1c2bda326b') UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b') #. The *base64* decoding:: >>> uuid.UUID(bytes=base64.urlsafe_b64decode("dXCEKvvsQNOuJT8cK9oyaw" + "==")) UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b') While encoding and decoding of the base64 form is more complex, it reduced the encoded form to 22 characters instead of at 32 (hex encoding) or 36 (string encoding). Where to use ------------ Generally, UUIDs must be used as primary identifier/primary key for each object in the main application database, ``Brain``. .. note:: The only objects at this time not using UUIDs are ``Role``\s, some telemetry related models and some many-to-many intermediate objects. However, for roles there's `an idea `_ to remove that database model and rely on the Python ``enum.Enum`` only. Many-to-many objects define their primary key as a combination of the referenced objects. Similarly for telemetry related objects, that define multi-column primary keys that include UUIDs of related objects. Which encoding to use --------------------- Given the 3 encodings outlined above, here's how and when each should be used: * The **primary key** of an object in ``Brain`` should always be stored in the *string* encoding. The same applies for **foreign keys**. * Kubernetes **pod names**, **statefulset names**, **deployment names**, etc. should use the *base64* encoding. The ID will end up in the file name of log files written on the Kubernetes hosts. From there a pod's name and with that the base64 encoded primary key will end up in telemetry database records for logs and metrics. * The ``cratedb.cloud/resource-id`` label should always use the **hex** encoding.