I'm having a question/request regarding the Contributions page in a repository, and the way that renamed files appear in there.
I've created a demo repository with 2 commits to demonstrate this:
Now, this may not seem to be significant at all, but the way this works can totally screw up contributions in large repositories where larger changes of this type can happen.
For example, recently I've moved a lot of files from one place to another in a huge monolith repository, as part of a bigger refactoring process. Before that, my contributions were around 67k "++", and those were real additions. However, after these refactoring changes went to the master branch, my additions went up to 154k "++" (deletions increased by the same amount as well).
Is there any way for such changes not to appear like this?
All modern version control systems are based ultimately on detecting and tracking the changes to files. Version control systems for source code are specifically designed for tracking changes to text files. The method that is used to do that is colloquially referred to as a "diff" algorithm, an algorithm that is designed to solve the longest common subsequence problem for the lines of text in the text file. This algorithm, given two sequences of items, finds a longest sequence of items that is present in both original sequences in the same order. Once you have such a sequence then, assuming that you want to determine how to change the first sequence into the second:
So, all changes to any text file in source control is expressed as a set of additions and deletions. For example, changing this text:
The quick brown fox jupppped over the lazy dog.
to this text:
The quick brown fox jumped over the lazy dog.
requires one deletion and one addition and is described thusly by the `diff` program on my machine:
2c2 < fox jupppped --- > fox jumped
Even if one were to simply reorder functions within a file, it would generate many deletions and additions. Because of this, it is almost certain that your ~67k additions are not all "real additions".
Much like SLOC, these metrics are designed to give an estimate of the churn of code in a project or system, not be able to exactly compare the contribution of two developers. This is especially true given that two different developers may solve the same problem more or less efficiently than the other, so the one that "added" more lines of code is not necessarily adding more benefit.
I realize that's probably not the answer you were wanting. But I hope it helps!
No, it not the answer I was after, and it doesn't help either :) I already did know that (in more or less details); my question was more along the lines of:
If you're able to suppress the diff when file rename/move is detected in a PR/commit, then why is the Contributions page not being "aware" of that?
So, there is a logic to figure it out, obviously - it's not like it's "ok we can figure out just additions and deletions, but nothing more". GitHub is able to tell (with very high probability) when something has been moved elsewhere, but it's not consistent.
You are developer yourself; so, given the ability of one of your sub-systems (Module A) to detect when Event A happens, then you'd probably be more than capable of making another sub-system (Module B) being aware of that event's existence, and then use it to tweak the Module B's output based on Event A's occurences and their properties. Right? :)
Of course those 67k additions were not all "true additions", but they were more real than the current figure; they used to provide some sense, now they provide none.
Anyway, just wanted to bring it up in case someone is watching. Probably the change would not be trivial to make - even if anyone "above" cared that much :D
Edit: God, these smileys are terrible!
Hi, I also noticed this issue. My addition and deletion in the contributor page are way more than the actual numbers I can count from every single diff just because I moved some files. How shall we escalate this issue to Github?