What is the new ID format and how is it generated

New global ID format coming to GraphQL - The GitHub Blog states that GitHub is using a new format, but doesn’t provide any technical details as to what it is… anyone have insight?

@alystair there are some details described in the blog.
Also the specification linked to in the blog global-object-identification discusses further.

If you are just performing simple ad-hoc GraphQL queries you would not even notice. If however you are storing or processing these IDs you will be impacted as highlighted in the article.

1 Like

I’m not actually utilizing the API in any way, just wanted to know the technical details behind the new ID system to see if it’s something one could implement in their own projects

Current format

Currently node IDs are base64 encoded, for example a Commit ID looks like this:

MDY6Q29tbWl0MzgxNjg3Mzg6YTBmOTUzZmIyNTZjMzNjMWQwYTdjZmY4ZDZjODg3MTFkNGRiYjhlZQ==

If you decode it, you will see something like this:

06:Commit38168738:a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee

It can be divided into two parts: the type and the primary key (PK) for the node:

06:Commit
38168738:a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee

The 06 is just a length of the type strings that follows (Commit). This way you could correctly split the primary key from the type.

As for the PK, the exact format will depend on the node type. Sometimes it’s just an integer that matches databaseId of the node (e.g. for Repository nodes), and sometimes it’s more complex (Commit, Ref).

For Commit specifically the PK contains repository databaseId (38168738) and the commit hash (a0f953fb256c33c1d0a7cff8d6c88711d4dbb8ee).

This encoding scheme is simple enough, but I guess it might be more expensive to encode/decode than base64(<binary encoding>).

New format

There’s not a lot I know about the new format, since there was no official posts after the one you’ve mentioned. But we accidentally encountered one type of those new IDs when requesting Push nodes from GraphQL API:

PSH_lAHOA3oJoM4NjdrwzwAAAAGo-ToA

I assume that the first part still defines the node type (PSH = Push), but as for the rest - the time will tell. We need to see a few more of those to know how they are encoded.

But it’s clear that there are a few improvements here:

  • The node type can be decoded more easily, since you can just split the ID instead of decoding base64 first.
  • The other part of the ID looks like base64(<some binary representation>), which makes it way more efficient.
1 Like

Since there’s still not a lot of information about the ID migration despite the fact a lot of types were migrated already, I though it would be useful to share what we learned about these new IDs so far.

As I mentioned earlier, all ID have a prefix that defines their type, followed by _ and then a Base64 URL-encoded payload follows.

We now know that the binary payload is encoded with MessagePack, and usually contains an array of values.

For example, for IssueComment nodes the array contains the following:

  1. Integer 0, which purpose is unknown. Probably a version of the ID encoding?
  2. Internal Github ID of the repository which contains this comment (aka Repository.databaseId).
  3. Internal Github ID of the comment itself (aka IssueComment.databaseId).

We are still curious to see how more complex nodes will look like, e.g. Commit or Ref nodes.

As for the list of node ID prefixes that we discovered, please check this thread: