Get the line of code committed in each language by a user

In my java application, I have a use case of counting the line of code committed by each git user in each language (eg: Java - 1000 LOC, CPP 500 LOC .etc). I am planned to follow the below steps.

  1. GET all the commits in each repo by the user as an author:
    https://api.github.com/repos/{repoOwnerId}/{repoId}/commits?author={authorId}
  2. GET the commit details https://api.github.com/repos/{repoOwnerId}/{repoId}/commits/{commitId}
    In the response, we have the total LOC and file name, I am planning to identify the language in committed file based on the file extension (eg: committed file end with .java then its LOC count for JAVA language)

But performance-wise I am feeling which is a bad one. Also, Git Api allows only 5000 hits per hour for an authenticated user. Could anyone please suggest any better option for the same.

:wave: welcome to the GitHub Support Community, @termjobTest ! Thanks for asking your question here.

As far as I’m aware, there’s not one particular endpoint nor is there a set of endpoints beyond what you mentioned that would give you the lines of code committed (added or deleted) per user. While you could use the GitHub API to collect that information, you can obtain the same information by cloning the repository and using git's built-in commands to compute the data that you’re looking for. This approach avoids making any calls to the GitHub API.

Here’s an example (though I imagine there are more ways that others in our community can offer input on).

Clone and change directory into one of your repositories:

git clone git@github.com:termjobTest/test.git 
cd test

There’s a git-diff-tree command that compares the content and mode of blobs found via two tree objects. Run it:

# Shows the diff-tree for the latest HEAD.
git diff-tree --format=short --numstat -r HEAD 

Author: ajit-indiemaker <53966513+ajit-indiemaker@users.noreply.github.com>

    Create ajittest3.rb

105	0	ajittest3.rb

It’s possible to format the output to show specific metadata, like the commit SHA, the author information, and even the files changed including how many lines were added and deleted. Here’s the documentation for each of the flags used:

You could then write a script that iterates over all of the commits on a single branch (and possibly iterate through all branches) performing this command. --format is configurable so you can choose what kind of output you’d like to see for your use case. I hope this helps!

1 Like