========
Headline
========

+-------------+------------------------------------------------+
| CCEP        | 3                                              |
+-------------+------------------------------------------------+
| Title       | Using and encoding IDs                         |
+-------------+------------------------------------------------+
| Version     | 1                                              |
+-------------+------------------------------------------------+
| Author      | Markus Holtermann                              |
+-------------+------------------------------------------------+
| Date        | 2019-03-11                                     |
+-------------+------------------------------------------------+
| Status      | Rejected                                       |
+-------------+------------------------------------------------+

.. important::

    This CCEP was rejected!

    While there are no queries in ``cloud-app`` that rely on e.g. the cluster
    ID, there *are* parts in ``cloud-fe`` that rely on the first eight
    characters of the *string* or *hex* encoding of a cluster ID.

    Additionally, when one wants to query the telemetry database manually and
    look for logs or metrics for a specific cluster, using the base64 encoded
    form will cause additional mental burden and overhead.

Introduction
============

There are loads of items in CrateDB Cloud that need to be identified. To do so,
UUIDs following :rfc:`4122` are used. These UUIDs are also known as UUID
version 4.

This CCEP is about the how UUIDs should be encoded and used throughout the
CrateDB Cloud product.

Proposal
========

Encoding formats
----------------

One can create a UUID 4 in Python using this code snippet::

    >>> import uuid
    >>> uuid.uuid4()
    UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')

Given this UUID, there are 3 encoding forms used throughout CrateDB Cloud:

#. The *string* encoding::

       >>> str(uuid.uuid4())
       '7570842a-fbec-40d3-ae25-3f1c2bda326b'

#. The *hex* encoding::

       >>> uuid.uuid4().hex
       '7570842afbec40d3ae253f1c2bda326b'

#. The *base64* encoding::

       >>> import base64
       >>> base64.urlsafe_b64encode(uuid.uuid4().bytes).decode().rstrip("=")
       'dXCEKvvsQNOuJT8cK9oyaw'

Similarly, decoding of these forms works like this:

#. The *string* decoding::

       >>> uuid.UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')
       UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')

#. The *hex* decoding::

       >>> uuid.UUID('7570842afbec40d3ae253f1c2bda326b')
       UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')

#. The *base64* decoding::

       >>> uuid.UUID(bytes=base64.urlsafe_b64decode("dXCEKvvsQNOuJT8cK9oyaw" + "=="))
       UUID('7570842a-fbec-40d3-ae25-3f1c2bda326b')

While encoding and decoding of the base64 form is more complex, it reduced the
encoded form to 22 characters instead of at 32 (hex encoding) or 36 (string
encoding).

Where to use
------------

Generally, UUIDs must be used as primary identifier/primary key for each object
in the main application database, ``Brain``.

.. note::

    The only objects at this time not using UUIDs are ``Role``\s, some
    telemetry related models and some many-to-many intermediate objects.
    However, for roles there's `an idea
    <https://crate.slack.com/archives/CE4HP618S/p1552312370010500>`_ to remove
    that database model and rely on the Python ``enum.Enum`` only. Many-to-many
    objects define their primary key as a combination of the referenced
    objects. Similarly for telemetry related objects, that define multi-column
    primary keys that include UUIDs of related objects.

Which encoding to use
---------------------

Given the 3 encodings outlined above, here's how and when each should be used:

* The **primary key** of an object in ``Brain`` should always be stored in the
  *string* encoding. The same applies for **foreign keys**.

* Kubernetes **pod names**, **statefulset names**, **deployment names**, etc.
  should use the *base64* encoding. The ID will end up in the file name of log
  files written on the Kubernetes hosts. From there a pod's name and with that
  the base64 encoded primary key will end up in telemetry database records for
  logs and metrics.

* The ``cratedb.cloud/resource-id`` label should always use the **hex**
  encoding.