BUG: incorrect language detection

It appears that the language detection feature in GitHub is off by a large margin.

My search for recently updated C# projects (https://api.github.com/search/repositories?q=language:C%23+sort:updated)

returned plenty of non-C# project labeled as C#.

E.g. https://github.com/haru-shinobu/StickFigure is labeled as C# 95.4%
ShaderLab 4.6% when it is just a bunch of Unity assets.

I estimate at least 10% of the results in the last few searches were completely off.

Hi @rimutaka thanks for the great observation.

GitHub determines language statistics based on the total size of the files that belong to each language and we use the Linguist open source library to analyze source code files.

I suspect those Unity files are quite large compared to the other language files in that repo.
You can find more information here about why a repository detected the wrong language.

1 Like