Accurate language breakdowns

I have a project that GitHub says is 50% Ruby, 40% HTML and 10% CSS:


If you look at the code actually being executed at runtime it would be something more like:

  • 30% Ruby
  • 20% C
  • 20% Javascript
  • 20% HTML
  • 10% CSS

This is because my project has oj as a dependency which is half C:

It’s also because there is Javascript embedded in <script> tags in my repo which GitHub considers just HTML.

It would be great if GitHub could count dependencies in its breakdown, and even better count Javascript embedded within <script> tags.

I think it would be so cool to see what’s really running projects when I look at the breakdowns on GitHub.

The language statistics are supposed to show what code is in the repository, not what gets executed at runtime. Runtime statistics would depend a lot on the platform, e.g. which libraries are linked in (maybe optionally), and things like HTML and CSS wouldn’t show up at all, because they are rendered, not executed. :wink:

You’ll see the point of focusing on repository content when you click on a language name: You can search the repository by language. :slightly_smiling_face:

That’s an interesting point. The Linguist documentation explicitly states that language detection is per-file, but I don’t know the rationale for that. Possibly simply that a per-line breakdown would be more complex than it’s useful. :sweat_smile: