Avoid compressing .pdf for transfer

I’ve added a bunch of .pdf files to my repo. It’s slow to upload and, presumably, download. I assume this is in part due to git trying to compress the already compressed .pdf files.

  1. Is my understanding correct? Is git trying to compress .pdf files?
  2. Is there a simple way to avoid this since these files are already compressed?
  3. Why wouldn’t github set itself up to avoid compressing classes of binary files that are generally already compressed?

Thanks,

Mike

Hi @mikegilchrist! :wave: Welcome to the Community!

Git and GitHub are optimized to provide version control and code collaboration predominantly on text files, which means each push of data to our servers triggers computation on our end to apply necessary metadata and structure things efficiently for that purpose.

That means there are many use cases, such as backups of non-text files, or database dumps, that are unsuitable for Git, and an inefficient strain on our infrastructure.

You can read more about this here:

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

To answer your questions:

  1. Git isn’t trying to compress your .pdf files in the same way as if you were using an archiving program, exactly - it’s setting up pack files, so that rather storing each revision of your files separately, it stores one version and the computes “deltas” of what has changed in each revision. There’s some interesting discussion about this in this reddit thread!

  2. I’m afraid you can’t prevent files from being processed, no.

  3. We don’t do this because fundamentally big binaries shouldn’t be committed to the Git history in the first place.

I hope that explains things!

1 Like

Thanks; that’s helpful.

Note I am not storing any big binary files, I am storing many, many small binary files. Regardless, I will stop worrying about this issue.

2 Likes