Diminishing disk space on Ubuntu images

Our Linux builds have started to fail because they are running out of disk space. While investigating the cause of the failures, I found that the amount of disk space on the Ubuntu images are declining on a weekly basis. Here is what gets displayed in our log for df -h within a typical Docker container that we’re using for our builds that takes about 4 GB of disk space once everything installed:

https://github.com/tensorflow/java/runs/515298915 (Mar 18, 2020)

overlay84G72G12G86% /

https://github.com/tensorflow/java/runs/559103432 (Apr 4, 2020) 

overlay84G75G8.8G90% /

https://github.com/tensorflow/java/runs/581350731 (Apr 13, 2020)

overlay84G76G7.8G91% /

So it looks like we’re losing about 1 GB of space every week, and in about 3 months, we’ll be reaching 0 GB, probably preventing the images from being able to boot at all.

4 Likes

Hi @saudet ,

I checked your workflow yml file, you are using container nvidia/cuda:10.1-cudnn7-devel-centos7 to run your job. 

Based on my test , the decreased disk space is used by this image. Please see my job yml and workflow logs :

job1:
    runs-on: ubuntu-latest
    steps:
      - name: df
        run: |
          df -h
          docker system df
          docker run nvidia/cuda:10.1-cudnn7-devel-centos7
          df -h
          docker system df

The size of image is 3.5GB . 

Yes, I know and I did mention that the container takes about 4 GB of space. The point is, 13 GB of space is less than the 14 GB that is supposed to be guaranteed:

Each virtual machine has the same hardware resources available.

  • 2-core CPU
  • 7 GB of RAM memory
  • 14 GB of SSD disk space

https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners

If you only guarantee 13 GB, then the documentation should be updated. Do you agree?

1 Like

As of at least yesterday, we’re now down to about 9 GB:

https://github.com/tensorflow/java/runs/599744540 (Apr 20, 2020)

overlay 84G 80G 3.7G 96% /

I doubt very much we’re supposed to have so little disk space to work with.

1 Like

I had seen the same thing on our builds that started to fail. The following values are after docker system prune:

/dev/sda1    84G    74G    9.9G    89%    /

And today:

/dev/sda1    84G    70G    15G    84%    /

So they fixed whatever they temporarely broke (I hope). It would have been nice to acknowledge the problem, so that we now that a fix was underway.

You could see the logs in my above screenshot , the /dev/sdb1 is 14 G, used 41M, then the Avail show 13G . It is not an accurate data. 

Sorry for this bad experience. There is an opening issue in virtual-environments repo ,  engineering team are working on a fix fot it. https://github.com/actions/virtual-environments/issues/709#issuecomment-616767507 

Please wait for sometime. Kindly let me know your current status. 

1 Like