Great question! Unfortunately, the answer is “not really, no.”
Source control systems since the days of our forefathers have essentially been line-oriented beasts, in that they detect differences between files using the diff utility. Since diff calculates the minimum number of lines that have to be changed to transform the “before” file into the “after” file, it often doesn’t deal super-well with structured data that isn’t strictly line-oriented, for example JSON or XML. This often creates a tension between what makes the most sense from a language-design perspective or from a “simple tools are more robust” perspective.
With that said, working in a line-oriented way works really, really well given how little it has to know about the contents of the file. Essentially, it has to be a text file, preferably with multiple lines of text. And it’s best if it knows what the exact sequence of characters is that signify a line separator (but since the Windows line separator includes the Unix line separator inside it, it can wing it in some cases). It doesn’t have to understand any more than that about the syntax of the information in the file and doesn’t need to know anything about the semantics of the file’s contents.
In order to do what you’re requesting, the diff++ utility (or whatever the replacement for diff is called) would need:
- A completely foolproof way of detecting the file format of the text within the file
- A parser for every possible file format OR a parser for the most popular formats and an extension mechanism to allow people to define their own
- A meta-language that allows people to signify implicit rules about the text (such as ordinality) that aren’t explicit in the file format
The reason for this is because you’re asking the diff++ utility to understand the semantics of the contents of the file. For example, are these two JSON files different or the same?
"message": "Hello, world!"
"message": "Hello, world!",
Syntactically, they’re different because:
- The ID changed from an integer to a string
- The order of the keys changed
But semantically? Who knows! They could be very different if the order of keys is significant and there are restrictions on the data type of the ID key. Or they could be identical if neither of those matter.
So what’s the solution? Well, you could attempt to redesign the content of the files in a way that makes the semantics explicit and line-oriented. But this would also probably make editing them or parsing them with other tools unnecessarily complicated.
Honestly, the best solution that we, as an industry, have been able to come up with for this problem in the past 50+ years is exactly what git delivers: it goes as far as it can to figure things out and knows when to give up and make a human take over so it doesn’t mess anything up.
Then again, git and diff were optimized for source code and not textual data files. So perhaps if the project used some other tool for managing data such as an ETL system, that might work out better?
Let me know if you have more questions.