How to compare forked repos for duplicates?

I forked a repo and noticed that there are 67 others who did the same:

How can I find out which of these are all duplicates without opening and browsing through all repos?

It would be great to have a simple info next to the forked repos, whether they are duplicates or how many files are identical or a percentage of how much code is identical.

1 Like

In order to do this, you would first have to define what you mean by “duplicate”. Would a repository that contains all of the same branches at exactly the same commit be a duplicate of another repository? Or would you a repository that has a default branch at the same commit or any later commit be a duplicate of another repository? Or somewhere in between? Once you define what you mean by “duplicate” then you could potentially write a script that uses the GitHub API to examine all of the forks of a repo and checks for duplicates.

1 Like

thank you for your answer. sorry for the late response. 

A “duplicate” I would define in this context as a complete copy of the default branch (often the “master” branch) of a repo with open pull-requests and/or with not-merged commits. Any commits which were already merged through a pull-request I would ignore (considering them as “duplicate”).

So I simply would want to see a count of additional commits (which were not merged based on a pull-request) of all the other forked repos.

If you’re looking for a GitHub feature to declare whether a fork is a duplicate, I can take that feedback to the development team. I can’t make any promises as to whether or when such a feature would be added. So if you’re looking for something like this in the near term, you may want to investigate using the GitHub API to build a script that would calculate what you’re looking for.

1 Like

yes, such a feature would be great.

thank you for taking it to the development team.