New global ID format coming to GraphQL - The GitHub Blog states that GitHub is using a new format, but doesn’t provide any technical details as to what it is… anyone have insight?
If you are just performing simple ad-hoc GraphQL queries you would not even notice. If however you are storing or processing these IDs you will be impacted as highlighted in the article.
I’m not actually utilizing the API in any way, just wanted to know the technical details behind the new ID system to see if it’s something one could implement in their own projects
Currently node IDs are base64 encoded, for example a
Commit ID looks like this:
If you decode it, you will see something like this:
It can be divided into two parts: the type and the primary key (PK) for the node:
06 is just a length of the type strings that follows (
Commit). This way you could correctly split the primary key from the type.
As for the PK, the exact format will depend on the node type. Sometimes it’s just an integer that matches
databaseId of the node (e.g. for
Repository nodes), and sometimes it’s more complex (
Commit specifically the PK contains repository
38168738) and the commit hash (
This encoding scheme is simple enough, but I guess it might be more expensive to encode/decode than
There’s not a lot I know about the new format, since there was no official posts after the one you’ve mentioned. But we accidentally encountered one type of those new IDs when requesting
Push nodes from GraphQL API:
I assume that the first part still defines the node type (
Push), but as for the rest - the time will tell. We need to see a few more of those to know how they are encoded.
But it’s clear that there are a few improvements here:
- The node type can be decoded more easily, since you can just split the ID instead of decoding base64 first.
- The other part of the ID looks like
base64(<some binary representation>), which makes it way more efficient.
Since there’s still not a lot of information about the ID migration despite the fact a lot of types were migrated already, I though it would be useful to share what we learned about these new IDs so far.
As I mentioned earlier, all ID have a prefix that defines their type, followed by
_ and then a Base64 URL-encoded payload follows.
We now know that the binary payload is encoded with MessagePack, and usually contains an array of values.
For example, for
IssueComment nodes the array contains the following:
0, which purpose is unknown. Probably a version of the ID encoding?
- Internal Github ID of the repository which contains this comment (aka
- Internal Github ID of the comment itself (aka
We are still curious to see how more complex nodes will look like, e.g.
As for the list of node ID prefixes that we discovered, please check this thread: