What is the new ID format and how is it generated

New global ID format coming to GraphQL - The GitHub Blog states that GitHub is using a new format, but doesn’t provide any technical details as to what it is… anyone have insight?

@alystair there are some details described in the blog.
Also the specification linked to in the blog global-object-identification discusses further.

If you are just performing simple ad-hoc GraphQL queries you would not even notice. If however you are storing or processing these IDs you will be impacted as highlighted in the article.

1 Like

I’m not actually utilizing the API in any way, just wanted to know the technical details behind the new ID system to see if it’s something one could implement in their own projects

Current format

Currently node IDs are base64 encoded, for example a Commit ID looks like this:

MDY6Q29tbWl0MzgxNjg3Mzg6YTBmOTUzZmIyNTZjMzNjMWQwYTdjZmY4ZDZjODg3MTFkNGRiYjhlZQ==

If you decode it, you will see something like this:

06:Commit38168738:a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee

It can be divided into two parts: the type and the primary key (PK) for the node:

06:Commit
38168738:a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee

The 06 is just a length of the type strings that follows (Commit). This way you could correctly split the primary key from the type.

As for the PK, the exact format will depend on the node type. Sometimes it’s just an integer that matches databaseId of the node (e.g. for Repository nodes), and sometimes it’s more complex (Commit, Ref).

For Commit specifically the PK contains repository databaseId (38168738) and the commit hash (a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee).

This encoding scheme is simple enough, but I guess it might be more expensive to encode/decode than base64(<binary encoding>).

New format

There’s not a lot I know about the new format, since there was no official posts after the one you’ve mentioned. But we accidentally encountered one type of those new IDs when requesting Push nodes from GraphQL API:

PSH_lAHOA3oJoM4NjdrwzwAAAAGo-ToA

I assume that the first part still defines the node type (PSH = Push), but as for the rest - the time will tell. We need to see a few more of those to know how they are encoded.

But it’s clear that there are a few improvements here:

  • The node type can be decoded more easily, since you can just split the ID instead of decoding base64 first.
  • The other part of the ID looks like base64(<some binary representation>), which makes it way more efficient.
1 Like