Avoid compressing .pdf for transfer #22062

mikegilchrist · 2020-07-21T15:17:38Z

mikegilchrist
Jul 21, 2020

I’ve added a bunch of .pdf files to my repo. It’s slow to upload and, presumably, download. I assume this is in part due to git trying to compress the already compressed .pdf files.

Is my understanding correct? Is git trying to compress .pdf files?
Is there a simple way to avoid this since these files are already compressed?
Why wouldn’t github set itself up to avoid compressing classes of binary files that are generally already compressed?

Thanks,

Mike

Answered by yamiacat

Jul 29, 2020

Hi @mikegilchrist! 👋 Welcome to the Community!

Git and GitHub are optimized to provide version control and code collaboration predominantly on text files, which means each push of data to our servers triggers computation on our end to apply necessary metadata and structure things efficiently for that purpose.

That means there are many use cases, such as backups of non-text files, or database dumps, that are unsuitable for Git, and an inefficient strain on our infrastructure.

You can read more about this here:

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

To answer your questions:

Git isn’t trying to compress your .pdf files in the same way as if you were u…

View full answer

yamiacat · 2020-07-29T15:29:14Z

yamiacat
Jul 29, 2020

Hi @mikegilchrist! 👋 Welcome to the Community!

Git and GitHub are optimized to provide version control and code collaboration predominantly on text files, which means each push of data to our servers triggers computation on our end to apply necessary metadata and structure things efficiently for that purpose.

That means there are many use cases, such as backups of non-text files, or database dumps, that are unsuitable for Git, and an inefficient strain on our infrastructure.

You can read more about this here:

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

To answer your questions:

Git isn’t trying to compress your .pdf files in the same way as if you were using an archiving program, exactly - it’s setting up pack files, so that rather storing each revision of your files separately, it stores one version and the computes “deltas” of what has changed in each revision. There’s some interesting discussion about this in this reddit thread!
I’m afraid you can’t prevent files from being processed, no.
We don’t do this because fundamentally big binaries shouldn’t be committed to the Git history in the first place.

I hope that explains things!

0 replies

mikegilchrist · 2020-07-30T15:23:32Z

mikegilchrist
Jul 30, 2020
Author

Thanks; that’s helpful.

Note I am not storing any big binary files, I am storing many, many small binary files. Regardless, I will stop worrying about this issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Avoid compressing .pdf for transfer #22062

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

GitHub Community

Avoid compressing .pdf for transfer #22062

mikegilchrist Jul 21, 2020

Replies: 2 comments

yamiacat Jul 29, 2020

mikegilchrist Jul 30, 2020 Author

mikegilchrist
Jul 21, 2020

yamiacat
Jul 29, 2020

mikegilchrist
Jul 30, 2020
Author