Using email in commit message as identifier

Hi,

I have a use case where we need to keep track of user who made a commit. I have two question

  1. Does email in author information is a reliable way to track user for commit they’ve made or should we use something else?
  2. The no reply email is a github.com specific functionality or it’s available on self hosted GitHub enterprise?
  3. What happens if a commit doesn’t contain author email? Do user do it?

Hi there,

Thanks for the question.

You can use the GraphQL schema to look up the author (and GitHub user) associated with a commit, via the Commit object, which can be retrieved as a resource. Here’s one example query to return the GitHub user associated with this commit:

{
  resource(url: "https://github.com/octocat/Spoon-Knife/commit/d0dd1f61b33d64e29d8bc1372a94ef6a2fee76a9") {
    ... on Commit {
      author {
        user {
          databaseId
          login
        }
      }
    }
  }
}

The query results show that the commit was authored by the octocat user:

{
  "data": {
    "resource": {
      "author": {
        "user": {
          "databaseId": 583231,
          "login": "octocat"
        }
      }
    }
  }
}

There are some edge cases you should consider in your implementation:

  1. Not every commit will have a GitHub user association (as not every author of Git commits has a GitHub account)
  2. The commit author can be easily faked, consider if you should only use this method for signed commits
  3. The user login can change over time, but the user’s databaseId will be stable over time

I hope that helps!


EDIT:

  1. The commit author can be easily faked, consider if you should only use this method for signed commits

To provide some more detail on this point…

Here’s an example of a “faked” commit:

Note, this commit appears to have been authored by @octocat, but it was actually authored by me. It was easy to do just by setting my user.name and user.email config settings locally, via:

git config user.name "The Octocat"
git config user.email "octocat@nowhere.com"

Looking up the commit / author / user via GraphQL also shows @octocat as the author:

{
  resource(url: "https://github.com/swinton/public/commit/3e2490e3a20d773a7c37819c27f8dec41fe9be62") {
    ... on Commit {
      author {
        user {
          databaseId
          login
        }
      }
    }
  }
}

Returns:

{
  "data": {
    "resource": {
      "author": {
        "user": {
          "databaseId": 583231,
          "login": "octocat"
        }
      }
    }
  }
}

This is why it’s important to only “trust” commit authorship for signed commits, otherwise commits may be attributed to the incorrect author. With these points in mind, I would suggest considering enabling this feature only for signed commits, or for repositories that require signed commits as part of their branch protection rules.

To look up whether a repository branch requires signed commits, you can use the branch protections API, passing the zzzax-preview custom media type, i.e.

curl -H "Accept: application/vnd.github.zzzax-preview+json" -H "Authorization: bearer $GITHUB_TOKEN" https://api.github.com/repos/$OWNER/$REPO/branches/$BRANCH/protection

This will yield a response that includes a required_signatures property, containing an enabled field that will correspond with that branch’s setting.

Yes it’s possible to create a commit that doesn’t have an author email.

To test this, set up the user.name and user.email config like so:

git config user.name "Ghost"
git config user.email ""

And then create your commit.

Here’s an example of a commit that was created in this way:

In this case, the GraphQL query listed above will return null for the user attribute:

{
  resource(url: "https://github.com/swinton/public/commit/59b4bd8b6acf19d97ed2b26859a29b81b04f960e") {
    ... on Commit {
      author {
        user {
          databaseId
          login
        }
      }
    }
  }
}
{
  "data": {
    "resource": {
      "author": {
        "user": null
      }
    }
  }
}