GitHub trending has proven to be a failure

There has been countless discussions/questions on the trending mechanism, and numerous people complaining about not being able to see their popular repos on trending.

So far, the best official explanation I can find comes from this thread, quote:

As I stated, there is logic to prevent gaming of the system. Gaming of the system isn’t always or even necessarily often committed by the owner of the repository, but can also be undertaken by overenthusiastic fans of the repository, or even …

Because of this, we don’t describe our exact system for determining repository trends. I can understand that this is frustrating. Unfortunately, it’s the best way we’ve found to maintain the integrity and usefulness of the trending system.

Though not happy with the answer, I accepted it, and persuaded myself that maybe the algorithm GitHub uses is indeed better and more scientific.

Until today. Because I saw this repo ranked 2nd on trending.

So what is this “Go-000” repo? The repo has been deleted (update: the repo is back, with all the forks removed), most likely by its owner, but a bit of investigation revealed what it was about:

This is a repo of an online Golang course, offered by a company called geekbang. What happens is that the course has a repo with all the teaching materials, and the teacher let the students fork it. That’s why it has 659 forks and little stars.

Yet, this is all legit and common use case. Using GitHub for teaching is becoming more and more popular, because it’s very effective. So what is the problem?

There is only one answer: the GitHub trending mechanism.

It is says to “prevent gaming of the system”, but we have now found, by this evident case, that it is very easy to game the system: we just need enough forks.

Besides the algorithm itself being bad, there’s a bigger issue: its opacity. In cryptography, a good algorithm should be public and open, otherwise it can’t be verified and may contain serious yet hidden issues. The same goes for trending ranking algorithm. GitHub chooses to hide everything behind the scenes, so nobody can game the system - to me, it’s like closing your eyes pretending the problem does not exist. But the problems do exist. Lots of folks get frustrated not seeing their projects, repos like “Go-000” and “Learn algorithm”, “Learn Java”, “Cracking leetcode” trending all over the place. As a result, users are not able to see what they want to see from trending, less and less people take trending seriously or even look at it anymore.

I cannot help but to ask GitHub: is this what you want to see? A place meant for promoting good content filled up with low-quality stuff, and you still pretend there’s no problem.

I do believe the good intent, but the system is abused does not mean transparency is bad, it only means the algorithm is not good enough. Hiding everything behind the scene makes improving the algorithm harder, confuses people, and is definitely a bad idea, especially in the open source community.

The most important metric of GitHub’s success is developers’ happiness, and I see way more developers being unhappy about trending than ever before. Therefore, it is very clear, that GitHub trending has been proven to be a failure.

40 Likes

4 posts were merged into an existing topic: How github detect trending repositories?