Is there a way to get which branch a commit belongs to via GraphQL?

Hello,

I’m trying to learn the GitHub v4 GraphQL API and saw this Stackoverflow answer that uses this query to get a branches of a repository:

{
  repository(name: "git-point", owner: "gitpoint") {
    refs(first: 10, , refPrefix:"refs/heads/") {
      nodes {
        name
      }
    }
  }
}

The above query gets the branches that belong to a repository.

Instead of that, is there a way to get the branch that a commit belongs to?

Looking at the GitHub API v4 documention, it doesn’t list refs in a Commit's connections or fields… I’m sure it can be done, because the GitHub Insights tab can clearly identify and visualise which branches commits belong to.

Thanks!

P.S. The refs field above requires the first: argument. But is there a way to retrieve all commits and their branches instead of having to specify just the first or last n items?

Instead of that, is there a way to get the branch that a commit belongs to?

:wave: @penyuan , thanks for reaching out and asking this question!

Given some commit in a repository, there’s not a way in the GraphQL API to list all of the branches where that commit exists on.

However, one approach that you can take is fetching a list of branches (as the query you provided does) and make another request that lists the commits for one or more of those branches:

query {
  repository(owner: "github", name: "linguist") {
    refs(first: 20, refPrefix: "refs/heads/") {
      totalCount
      edges {
        node {
          name
          target {
            ... on Commit {
              history(first: 5) {
                edges {
                  node {
                    oid
                    author {
                      name
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

You may need to leverage pagination on the relevant fields to get all of the desired results. Once you have these results, you’ll need to write a program that checks if the commit is in the list of commits for each branch. I give credit to one of my colleagues who explained this approach using the REST API in this StackOverflow reply:

I hope this helps!

3 Likes

Thank you @francisfuzz that’s super helpful!

Sorry I’m very new to GraphQL, but I tried looking into pagination so that I can send multiple queries to eventually retrieve all commits on each branch, and using the following I can get the end cursors for a query:

{
  repository(owner: "github", name: "linguist") {
    refs(first: 20, refPrefix: "refs/heads/") {
      totalCount
      edges {
        node {
          name
          target {
            ... on Commit {
              history(first: 5) {
                edges {
                  node {
                    oid
                  }
                }
                pageInfo {
                  hasNextPage
                  endCursor
                }
              }
            }
          }
        }
      }
    }
  }
}

As you can see I’m using the pageInfo field to get the endCursor of the query. However, the query result gives me an endCursor for each branch, and I can’t figure out how to specify the next query to being after the endCursory for each branch. This is be because the history field takes only one after: argument instead of one for each branch.

I hope this makes sense. Am I using pageInfo at the wrong level in my query? Is there a way to construct the query programatically so that I can run consequtive queries so that each one retrieves commits for each branch after the last one in the previous query? Thanks!

EDIT: I think this thread articulates my question better than me. I.e. how do you do nested pagination?

@pengyuan - Thanks for writing back and posting that follow-up question!

I think I understand what you mean around doing “nested” pagination, where you’re able to paginate through a nested fields resources. However, running a second query would only allow you to specify a single cursor for after in the history’s commits, where there could be numerous after values that you want to paginate through.

So, while you could iterate on each of those branches’ commit histories and their respective next pages, you’ll still need to make a request for each set of commit histories for each of those branches. I’m sorry I didn’t explain this more thoroughly in my initial reply, but I hope this context helps!

2 Likes

Ah yes, you perfectly explained what I was trying to describe. Thank you for the crazy-quick response!

you’ll still need to make a request for each set of commit histories for each of those branches.

Understood. I guess I will make my queries in the following order then:

  1. Using a simplified version of the query from your original answer to build a list of all branches. (if there are a crazy number of branches, I might even need to paginate this!)
  2. Loop through each branch.
  3. On each iteration of the loop (i.e. while I am on each branch), send paginated queries to get all the commits belonging to that branch.

Sorry one more question to wrap up this topic: From what I can tell base on this thread, each commit will belong to exactly one branch. Is this correct? If so, then I don’t need to worry about checking if the same commit appears under multiple branches which would save time and effort. Does this also mean all the commits belonging to each branch would collectively represent every commit to the repository, nothing more, nothing less?

Another edit: Oh wait, I just realized that sometimes branches (i.e. refs) get deleted, leaving their constituent commits “dangling” on their own. What “branches” would those commits fall under? Or would they not belong to any branch, and hence will be missed if I only look for the commits under each ref? Is there any way to identify commits like this where they are on a separate but unlabeled “branch”? For example, here’s one commit in the octocat hello world repo that is on a separate, but “unlabeled” (presumably deleted) branch (I point to it with a red arrow):

In context of the query that’s run, a commit that shows up in the branch’s history should only show up once for each respective branch. However, it’s possible for that same commit to be present in other branches.

Example

Let’s say that you created a repository on GitHub.com, initializing it with a README. At this point in time, the repository will have one commit at some SHA value (we’ll call it A).

The moment you create a new branch (without committing anything) from the repository’s default branch, your repository will have two branches both of which point to A. Running the aforementioned GraphQL query will show these two branches both of which have the same commit history.

I created an example repository illustrating this point. The default branch is main which points to 7115ad5. It has another branch, secondary, which also points to 7115ad5.

Running the following query will showcase the same commit in both branches’ commit history:

query {
  repository(owner: "francisfuzz", name: "main-and-secondary") {
    refs(first: 20, refPrefix: "refs/heads/") {
      totalCount
      edges {
        node {
          name
          target {
            ... on Commit {
              history(first: 5) {
                edges {
                  node {
                    oid
                    message
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Here’s the data returned:

Result Set
{
  "data": {
    "repository": {
      "refs": {
        "totalCount": 2,
        "edges": [
          {
            "node": {
              "name": "main",
              "target": {
                "history": {
                  "edges": [
                    {
                      "node": {
                        "oid": "7115ad5e6bab9f6dfac61c699198cf53a54eb52b",
                        "message": "Initial commit"
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "node": {
              "name": "secondary",
              "target": {
                "history": {
                  "edges": [
                    {
                      "node": {
                        "oid": "7115ad5e6bab9f6dfac61c699198cf53a54eb52b",
                        "message": "Initial commit"
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}

If one of your primary concerns is time and effort (and perhaps performance), it may be worth exploring the approach of cloning the repository to a machine and leverage the git-branch command along with its --contains flag. This is one way of only listing branches which contain some specified commit. The benefit is that you avoid spending any part of your GitHub API rate limit completely and you can format the results according to your use case.

In general, “dangling” commits wouldn’t be a part of any branch. One of the reasons a “dangling” commit exist is that some change was committed to Git history, but it has been force pushed “over” where the more updated commit exists in its place.

Our systems periodically run garbage collection to remove these “dangling” commits; if you come across one, its presence isn’t guaranteed for any period of time.

I’m not aware of a method of querying those commits via our API (happy to have others reading this chime in if you do know :wink:). However, if you do stumble upon one by whatever means, you can make a request to our Get a commit endpoint to find out more about that particular commit.

I hope this helps!

3 Likes

Woah! Hold on. :thinking: Sorry this actually confuses me more. Here’s what I mean:

I’ve been looking at the octocat/Hello-World example (BTW, is this an official GitHub example repository???) since this repository only has a grand total of five commits. Here is my query to retrieve this repository’s branches and the commits for each branch:

query {
  repository(owner: "octocat", name: "Hello-World") {
    refs(first: 10, refPrefix: "refs/heads/") {
      totalCount
      edges {
        node {
          name
          target {
            ... on Commit {
              history(first: 10) {
                edges {
                  node {
                    oid
                    author {
                      name
                    }
                    messageHeadline
                    commitUrl
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Here is the query’s response:

API response
{
  "data": {
    "repository": {
      "refs": {
        "totalCount": 3,
        "edges": [
          {
            "node": {
              "name": "master",
              "target": {
                "history": {
                  "edges": [
                    {
                      "node": {
                        "oid": "7fd1a60b01f91b314f59955a4e4d4e80d8edf11d",
                        "author": {
                          "name": "The Octocat"
                        },
                        "messageHeadline": "Merge pull request #6 from Spaceghost/patch-1",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/7fd1a60b01f91b314f59955a4e4d4e80d8edf11d"
                      }
                    },
                    {
                      "node": {
                        "oid": "762941318ee16e59dabbacb1b4049eec22f0d303",
                        "author": {
                          "name": "Johnneylee Jack Rollins"
                        },
                        "messageHeadline": "New line at end of file. --Signed off by Spaceghost",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/762941318ee16e59dabbacb1b4049eec22f0d303"
                      }
                    },
                    {
                      "node": {
                        "oid": "553c2077f0edc3d5dc5d17262f6aa498e69d6f8e",
                        "author": {
                          "name": "cameronmcefee"
                        },
                        "messageHeadline": "first commit",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/553c2077f0edc3d5dc5d17262f6aa498e69d6f8e"
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "node": {
              "name": "octocat-patch-1",
              "target": {
                "history": {
                  "edges": [
                    {
                      "node": {
                        "oid": "b1b3f9723831141a31a1a7252a213e216ea76e56",
                        "author": {
                          "name": "The Octocat"
                        },
                        "messageHeadline": "sentence case",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/b1b3f9723831141a31a1a7252a213e216ea76e56"
                      }
                    },
                    {
                      "node": {
                        "oid": "7fd1a60b01f91b314f59955a4e4d4e80d8edf11d",
                        "author": {
                          "name": "The Octocat"
                        },
                        "messageHeadline": "Merge pull request #6 from Spaceghost/patch-1",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/7fd1a60b01f91b314f59955a4e4d4e80d8edf11d"
                      }
                    },
                    {
                      "node": {
                        "oid": "762941318ee16e59dabbacb1b4049eec22f0d303",
                        "author": {
                          "name": "Johnneylee Jack Rollins"
                        },
                        "messageHeadline": "New line at end of file. --Signed off by Spaceghost",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/762941318ee16e59dabbacb1b4049eec22f0d303"
                      }
                    },
                    {
                      "node": {
                        "oid": "553c2077f0edc3d5dc5d17262f6aa498e69d6f8e",
                        "author": {
                          "name": "cameronmcefee"
                        },
                        "messageHeadline": "first commit",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/553c2077f0edc3d5dc5d17262f6aa498e69d6f8e"
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "node": {
              "name": "test",
              "target": {
                "history": {
                  "edges": [
                    {
                      "node": {
                        "oid": "b3cbd5bbd7e81436d2eee04537ea2b4c0cad4cdf",
                        "author": {
                          "name": "The Octocat"
                        },
                        "messageHeadline": "Create CONTRIBUTING.md",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/b3cbd5bbd7e81436d2eee04537ea2b4c0cad4cdf"
                      }
                    },
                    {
                      "node": {
                        "oid": "7fd1a60b01f91b314f59955a4e4d4e80d8edf11d",
                        "author": {
                          "name": "The Octocat"
                        },
                        "messageHeadline": "Merge pull request #6 from Spaceghost/patch-1",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/7fd1a60b01f91b314f59955a4e4d4e80d8edf11d"
                      }
                    },
                    {
                      "node": {
                        "oid": "762941318ee16e59dabbacb1b4049eec22f0d303",
                        "author": {
                          "name": "Johnneylee Jack Rollins"
                        },
                        "messageHeadline": "New line at end of file. --Signed off by Spaceghost",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/762941318ee16e59dabbacb1b4049eec22f0d303"
                      }
                    },
                    {
                      "node": {
                        "oid": "553c2077f0edc3d5dc5d17262f6aa498e69d6f8e",
                        "author": {
                          "name": "cameronmcefee"
                        },
                        "messageHeadline": "first commit",
                        "commitUrl": "https://github.com/octocat/Hello-World/commit/553c2077f0edc3d5dc5d17262f6aa498e69d6f8e"
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}

The first observation is that there are three commits shared by all branches. Yet if I follow the commitUrl for these three commits (7fd1a60, 7629413, and 553c207), their respective pages all tell me that they belong to the master branch as shown in this screenshot (circled in red):

I chose this screenshot because it leads to my second observation. The commit 7629413 - despite it claiming to be part of the master branch in the screenshot - is placed in its own, separate blue-colored branch in octocat/Hello-World's Insights Network graph (circled in red):

Commit 7629413 is the “dangling” commit I’m talking about. My guess is that it was part of a named branch, but that branch was deleted once its sole commit 7629413 was merged into master (as evidenced by the merge 7fd1a60 showing two parents). But:

  1. Do you mean commit 7629413 will eventually be deleted as part of GitHub’s garbage collection??? :scream:
  2. How does the network graph “decide” to place commit 7629413 into its own, unnamed (and blue) branch in the visualization?
  3. Why are three commits (7fd1a60, 7629413, and 553c207 as mentioned above) listed in all three branches in the query result when they themselves claim to be in the master branch?
  4. Given 2., how do I decide which branch a commit really belongs to since the same commit would show up in multiple branches in the query response? How does the network graph in 1. make this decision when visualizing the branches and coloring them?

Put another way, what is the logic behind the Github network graph and how to I re-create it? I also want to make sure that using the query I showed above with your help, I can exhaustively find all of the commits in a repository.

Whew, sorry for my highly verbose posts, but thank you for your patience you’ve helped me learn a great deal!! :pray: :joy:

1 Like

@penyuan - Thanks for following up! Sorry for any confusion here––I’ll do my best to clarify, though you’re welcome to follow up with any new questions or observations.

Consider this a space for everyone to learn! :+1:

octocat/hello-world is one of the example repositories GitHub uses in its documentation (see Forming calls with GraphQL - example query as one such example).

Putting myself in your shoes, I see what you mean when you say “dangling” commit. After thinking a bit more about this, I think we used the same term to describe two different things (and that’s okay!). Here’s what I mean by “dangling” commit––I describe it as a commit that does not belong to any branch in the earlier referenced example repository.

Here’s an example:

… and here’s the commit that was force pushed over in its place:

Do you mean commit 7629413 will eventually be deleted as part of GitHub’s garbage collection??? :scream:

Coming back to your case, 7629413 was merged into master as indicated by the Network graph and in the commit view. This commit won’t be deleted or removed as a part of GitHub’s garbage collection since it’s a part of master branch’s history.

However, the earlier example of 654c702 as a “dangling” commit would be removed some time in the future as a part of garbage collection.

How does the network graph “ decide ” to place commit 7629413 into its own, unnamed (and blue) branch in the visualization?

While I can’t speak to the specifics of how that feature is implemented (its source is closed and a part of GitHub’s product), I think that the unnamed blue branch is either:

  • a branch on a fork that introduced the commit
  • a branch on this repository that existed, but was deleted right after its commits were merged into master

To determine which case this falls under, I took this approach:

Thus, this unnamed blue branch is actually Spaceghost:patch-1. This branch wasn’t deleted after the pull request was merged. For whatever reason, the Network graph doesn’t show branches that originate from forks. I can’t speak to timelines on upcoming features, though if anyone would be interested in having that indication rendered, the best next step is to express interest by submitting a new request through our official product feedback form so that our product team can track it.

Why are three commits (7fd1a60, 7629413, and 553c207 as mentioned above) listed in all three branches in the query result when they themselves claim to be in the master branch?

Great question! Those three commits are in each of the query’s result set because each of those commits represent the branch’s commit history. What’s rendered in the commit view UI is meant to be an indicator of which branches this commit is reachable in. Granted, this indicator isn’t a documented feature and is something that’s subject to change at anytime. Checking the commit history using Git or using our API is the best way to determine where a commit has been.

Given 2., how do I decide which branch a commit really belongs to since the same commit would show up in multiple branches in the query response? How does the network graph in 1. make this decision when visualizing the branches and coloring them?

A commit can belong to many branches. I think that this section on Git branching in the official Git documentation does a better job of explaining this than I can:

When you make a commit, Git stores a commit object that contains a pointer to the snapshot of the content you staged. This object also contains the author’s name and email address, the message that you typed, and pointers to the commit or commits that directly came before this commit (its parent or parents): zero parents for the initial commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.

Building from that context, when you create some branch B from branch A, both branch’s commit history are identical until new commits are created on either branch. The way this was designed is foundational to Git’s workflow, so that even with divergent histories, you’re able to use Git to compare those two histories and observe the changes before making any decisions to fully commit those changes over from one branch to another.

Put another way, what is the logic behind the Github network graph and how to I re-create it? I also want to make sure that using the query I showed above with your help, I can exhaustively find all of the commits in a repository.

I briefly touched on this in an earlier answer––I’m not able to share how that Network graph is implemented. However, I think it might be worth checking out the git-log command’s documentation and leveraging its functionality to determine how Git stores commit histories and how to render them in the format you’re looking for. I hope this helps!

4 Likes

Amazing. Thank you @francisfuzz for your patience! :heart: It’s finally all coming together for me now.

Here’s what I mean by “dangling” commit––I describe it as a commit that does not belong to any branch in the earlier referenced example repository.

Understood. That makes sense now. Glad that the kind of commits I was thinking about won’t be deleted!

I think that the unnamed blue branch is either:

  • a branch on a fork that introduced the commit
  • a branch on this repository that existed, but was deleted right after its commits were merged into master

Understood. For the latter case (not from a fork), would a reasonable way to identify it be simply to see if it is one of multiple parents of a future commit (implying merging branches)? This way, I won’t have to rely on if the commit is associated with any Ref.

Great question! Those three commits are in each of the query’s result set because each of those commits represent the branch’s commit history.

I think that this section on Git branching in the official Git documentation does a better job of explaining this

Thanks! This really made it click for me. I clearly have much more to learn about Git!

While I can’t speak to the specifics of how that feature is implemented (its source is closed and a part of GitHub’s product)

Hehe, one could hope that one day GitHub will be fully open sourced like what other most platforms have already done (wink wink). :sweat_smile: Maybe I’ll submit this through the feedback form you mentioned.

Anyways, based on the wealth of knowledge I’ve learned in this thread, all commits in a repository will fall under at least one branch that my query finds (because commits form the diverging histories that branches represent). Is this correct? Put another way, all Refs will collectively include every commit ever made in a repository (with the exception of those “dangling” pushed-over commits that you mentioned), right?

1 Like

I’m not sure that I understand the question––I’m wondering if you could share more context around what you’re looking to accomplish given some commit? Are you looking to determine if that commit was introduced in a previously existing branch, or something else?

On a cursory search, if you’re ultimately looking to determine which branch a Git commit came from there’s an excellent conversation about this topic in this StackOverflow thread that might be of interest.

Considering the unnamed blue branch is a branch that once existed but was later deleted after its commits were merged onto master, it’s possible that those commits’ changes could have been incorporated as the result of a merge commit, squashed down to a single commit, or rebased onto the master branch. These are the three different merge methods on GitHub; this may play into the way you’re looking to visualize merge history.

That’s correct: all commits in a repository hosted on GitHub.com should be accessible via the documented REST API endpoints or GraphQL API fields.

Assuming that it’s a repository hosted on GitHub.com, that’s correct. I make that distinction here because it’s possible that you or other colleagues may have a copy of the repository on your own machines and have created branches and commits on those branches as a part of your work. Those branches and commits, until they’re pushed to GitHub.com, are not reachable. I don’t know if that’s over-explaining things :sweat_smile: but think it’s worth mentioning just in case!

3 Likes

@penyuan @francisfuzz thank you so much for this discussion :dizzy:

As an aside, I have my own curiosity now: there is an estimate of when gc kicks in?

Probably should be asked in a separate topic but there is a way, after a forced push, to “recover” an “danglingorphaned” commit (and what is the presumable time frame depending on gc schedule)?

1 Like

:wave: @maxdevjs , hello there & thanks for asking about this! If a “dangling” (also known as “orphaned”) commit exists, we don’t guarantee any specific timeline. My colleague @lee-dohm shared some more context about this topic in another topic:

If you have any follow-up questions about this, I encourage you to open a new topic about this in the How to use Git and GitHub board. :v:

4 Likes

Good evening house, I have gain a lot in this platform and hoping to achieve more through them.

Actually, that does basically answer my question. Very informative!

That’s correct: all commits in a repository hosted on GitHub.com should be accessible via the documented REST API endpoints or GraphQL API fields… Assuming that it’s a repository hosted on GitHub.com, that’s correct. I make that distinction here because it’s possible that you or other colleagues may have a copy of the repository on your own machines and have created branches and commits on those branches as a part of your work. Those branches and commits, until they’re pushed to GitHub.com, are not reachable. I don’t know if that’s over-explaining things :sweat_smile: but think it’s worth mentioning just in case!

Really appreciate the thoroughness, and yes I was operating with the assumption of just fetching what’s represented on GitHub and not what’s locally on contributors’ computers.

Once again thank you for your patience with me! I think my original question has been comprehensively answered. I don’t know how to mark all your responses as “the solution” so I will mark your first response for now. :sweat_smile:

I’m sure I’ll have more questions as I continue to learn more about GitHub, so I’ll post them in other threads.

Many thanks!! :hugs: :pray:t5:

2 Likes