[BUG] Strange "No space left on device" IOExceptions on GitHub runners?!?

Hi,

out of a sudden one of my github workflow runners are receiving the following strange error message:

Unhandledexception.System.IO.IOException:Nospace left on device
   atSystem.IO.FileStream.WriteNative(ReadOnlySpan`1source)
   atSystem.IO.FileStream.FlushWriteBuffer()
   atSystem.IO.FileStream.Flush(BooleanflushToDisk)
   atSystem.IO.StreamWriter.Flush(BooleanflushStream,BooleanflushEncoder)
   atSystem.Diagnostics.TextWriterTraceListener.Flush()
   atSystem.Diagnostics.TraceSource.Flush()
   atGitHub.Runner.Common.TraceManager.Dispose(Booleandisposing)
   atGitHub.Runner.Common.TraceManager.Dispose()
   atGitHub.Runner.Common.HostContext.Dispose(Booleandisposing)
   atGitHub.Runner.Common.HostContext.Dispose()
   atGitHub.Runner.Worker.Program.Main(String[] args)
System.IO.IOException:Nospace left on device
   atSystem.IO.FileStream.WriteNative(ReadOnlySpan`1source)
   atSystem.IO.FileStream.FlushWriteBuffer()
   atSystem.IO.FileStream.Flush(BooleanflushToDisk)
   atSystem.IO.StreamWriter.Flush(BooleanflushStream,BooleanflushEncoder)
   atSystem.Diagnostics.TextWriterTraceListener.Flush()
   atGitHub.Runner.Common.HostTraceListener.WriteHeader(Stringsource,TraceEventTypeeventType,Int32id)
   atGitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCacheeventCache,Stringsource,TraceEventTypeeventType,Int32id,Stringmessage)
   atSystem.Diagnostics.TraceSource.TraceEvent(TraceEventTypeeventType,Int32id,Stringmessage)
   atGitHub.Runner.Worker.Worker.RunAsync(StringpipeIn,StringpipeOut)
   atGitHub.Runner.Worker.Program.MainAsync(IHostContextcontext,String[] args)
System.IO.IOException:Nospace left on device
   atSystem.IO.FileStream.WriteNative(ReadOnlySpan`1source)
   atSystem.IO.FileStream.FlushWriteBuffer()
   atSystem.IO.FileStream.Flush(BooleanflushToDisk)
   atSystem.IO.StreamWriter.Flush(BooleanflushStream,BooleanflushEncoder)
   atSystem.Diagnostics.TextWriterTraceListener.Flush()
   atGitHub.Runner.Common.HostTraceListener.WriteHeader(Stringsource,TraceEventTypeeventType,Int32id)
   atGitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCacheeventCache,Stringsource,TraceEventTypeeventType,Int32id,Stringmessage)
   atSystem.Diagnostics.TraceSource.TraceEvent(TraceEventTypeeventType,Int32id,Stringmessage)
   atGitHub.Runner.Common.Tracing.Error(Exceptionexception)
   atGitHub.Runner.Worker.Program.MainAsync(IHostContextcontext,String[] args)

The strange thing here is, that the other build jobs within the matrix I use are running completly fine and without any issues. See here, e.g.:

https://github.com/jens-maus/RaspberryMatic/actions/runs/38504999

As you can see, only the “ova” job is actually causing this strange “No space left on device” error which then ends up in CI error on the whole project. Of course, I also tried to discuss this in the issue tracker of the actions/upload-artifact action (https://github.com/actions/upload-artifact/issues/9#issuecomment-585110758) but haven’t received any response from the github authors yet. In addition. I somehow have the feeling that this might point to a completly different failure within the GitHub Actions framework as only some days ago the same upload-artifact action worked fine even for the “ova” matrix build. See here: https://github.com/jens-maus/RaspberryMatic/actions/runs/36918297

So, now I wonder if someone in here might have an idea what could be the reason for this strange “No space left on device” error?!? Please note, that I of course made sure that old build artifacts are being deleted automatically so I guess this error might not come from some project-wise space limitations within the artifact storage.

Any help is highly appreciated!

@jens-maus ,

I have helped you report this issue to the appropriate engineering team, they will evaluate and investigate the issue. If they have any update, I will notify you in time. Or the appropriate engineers may also directly reply to you here.

Thanks for the reply. I just returned from my vacation and I noticed that my nightly builds still fail with these strange “No space left on device” errors. See for example here:

https://github.com/jens-maus/RaspberryMatic/actions/runs/43667982

And this time this seems to be unrelated to the actions/upload-artifact action?!?! So what is the current state of affairs on the investigation of the GitHub engineers on that “No space left on device” errors!??!

@brightran And see/compare the following two runs of GitHub Actions on the very same commit id while the first one run without the “No space left on device” error and the second one failed!?!?

Succeeded without any errors:

https://github.com/jens-maus/RaspberryMatic/actions/runs/40438818

Failed on the same commit id (only one day later) where I can’t see any reason why it might have failed:

https://github.com/jens-maus/RaspberryMatic/actions/runs/40977055

Hi @jens-maus,

It looks like you might be running out of disk space on the hosted machine.  We guarantee 10 GBs on our our runners.  Does it seem likely that you could be exceeding this?  You could add a space to check on space beteen existing steps to get a sense of this (“df -h” would work for linux). 

Hi @elbrenn 

that’s actually a good hint. In fact, my build jobs could easily occupy ~20GB auf temporary disk space until the build is finished. And when I look at the disk space before my build step I can see the following:

runner@fv-az33:~/work/RaspberryMatic/b$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 3.4G 0 3.4G 0% /dev
tmpfs 695M 692K 694M 1% /run
/dev/sda1 84G 64G 20G 77% /
tmpfs 3.4G 8.0K 3.4G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.4G 0 3.4G 0% /sys/fs/cgroup
/dev/sda15 105M 3.6M 101M 4% /boot/efi
/dev/sdb1 14G 41M 13G 1% /mnt

So there seem to be around 20GB of free disk space available, right? However, the final question remains why this was previouly working without any issues? Is this a somewhat new limitation here?!? And is there a way to get more temporary disk space on the GitHub runners so that my build jobs can succeed?!?

In practice the runners have higher specs than we guarantee to make sure we are delivering on our promises.  It’s possible that a previous version had more than 20GB or that your other runs used less for some other reason.

Here is a doc about the resources on our hosted runners: https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#supported-runners-and-hardware-resources.

Have you looked into using self-hosted runners for your workflow?

Thanks for the links. It’s a pity that the GitHub runners are obviously not supporting more than 20GB disk space because apart from that problem I am quite happy with GitHub Actions.

Hopefully the disk space limit will be increased at some point because I would prefer to stay with GitHub provided runners rather than setting up self-hosted runners as my projects are small scale open source projects where I can not afford to have a self hosted runner ready at all times to build nightly build archives, etc.

@jens-maus ,

As you have saw in the docs, currently the space is less than 20 GB.

If your projects really need more space on the runner, you also can report a feature request here. That will allow you to directly interact with the appropriate engineering team, and make it more convenient for the engineering team to collect and categorize your suggestions.

@brightran ,

Thanks for the hint and link, I will try to create an appropiate ticket ASAP.

However, I was actually able to workaround the disk space issue by ensuring that the runner OS environment is cleaned up as much as possible right before the actual build step is executed. After that cleanup the runners seem to have ~33GB of disk space available (compared to ~20GB) and thus my build jobs proceed now without any issues.

See here for the relevant workflow steps I added to cleanup the whole environment:

https://github.com/jens-maus/RaspberryMatic/blob/d5044bef3307bc61166377c162569de1a61cf332/.github/workflows/ci.yml#L34-L40

This includes:

  1. removal of the /swapfile
  2. cleanup of the apt cache
  3. removal of all cached docker images

All these cleanup steps seem to free about 13GB of disk space which are then enough that my build jobs can finish correctl.y

11 Likes

You don’t have to remove docker

Without removing docker, maybe the following is a good solution

sudo swapoff /swapfile
sudo rm -rf /swapfile /usr/share/dotnet /usr/local/lib/android /opt/ghc
1 Like
- name: Free disk space
      run: |
        sudo apt-get remove aria2 ansible azure-cli shellcheck rpm xorriso zsync \
          clang-6.0 lldb-6.0 lld-6.0 clang-format-6.0 clang-8 lldb-8 lld-8 clang-format-8 \
          clang-9 lldb-9 lld-9 clangd-9 clang-format-9 dotnet-sdk-3.0 dotnet-sdk-3.1=3.1.101-1 \
          esl-erlang firefox g++-8 g++-9 gfortran-8 gfortran-9 google-chrome-stable \
          google-cloud-sdk ghc-8.0.2 ghc-8.2.2 ghc-8.4.4 ghc-8.6.2 ghc-8.6.3 ghc-8.6.4 \
          ghc-8.6.5 ghc-8.8.1 ghc-8.8.2 ghc-8.8.3 ghc-8.10.1 cabal-install-2.0 cabal-install-2.2 \
          cabal-install-2.4 cabal-install-3.0 cabal-install-3.2 heroku imagemagick \
          libmagickcore-dev libmagickwand-dev libmagic-dev ant ant-optional kubectl \
          mercurial apt-transport-https mono-complete mysql-client libmysqlclient-dev \
          mysql-server mssql-tools unixodbc-dev yarn bazel chrpath libssl-dev libxft-dev \
          libfreetype6 libfreetype6-dev libfontconfig1 libfontconfig1-dev php7.1 php7.1-bcmath \
          php7.1-bz2 php7.1-cgi php7.1-cli php7.1-common php7.1-curl php7.1-dba php7.1-dev \
          php7.1-enchant php7.1-fpm php7.1-gd php7.1-gmp php7.1-imap php7.1-interbase php7.1-intl \
          php7.1-json php7.1-ldap php7.1-mbstring php7.1-mcrypt php7.1-mysql php7.1-odbc \
          php7.1-opcache php7.1-pgsql php7.1-phpdbg php7.1-pspell php7.1-readline php7.1-recode \
          php7.1-snmp php7.1-soap php7.1-sqlite3 php7.1-sybase php7.1-tidy php7.1-xml \
          php7.1-xmlrpc php7.1-xsl php7.1-zip php7.2 php7.2-bcmath php7.2-bz2 php7.2-cgi \
          php7.2-cli php7.2-common php7.2-curl php7.2-dba php7.2-dev php7.2-enchant php7.2-fpm \
          php7.2-gd php7.2-gmp php7.2-imap php7.2-interbase php7.2-intl php7.2-json php7.2-ldap \
          php7.2-mbstring php7.2-mysql php7.2-odbc php7.2-opcache php7.2-pgsql php7.2-phpdbg \
          php7.2-pspell php7.2-readline php7.2-recode php7.2-snmp php7.2-soap php7.2-sqlite3 \
          php7.2-sybase php7.2-tidy php7.2-xml php7.2-xmlrpc php7.2-xsl php7.2-zip php7.3 \
          php7.3-bcmath php7.3-bz2 php7.3-cgi php7.3-cli php7.3-common php7.3-curl php7.3-dba \
          php7.3-dev php7.3-enchant php7.3-fpm php7.3-gd php7.3-gmp php7.3-imap php7.3-interbase \
          php7.3-intl php7.3-json php7.3-ldap php7.3-mbstring php7.3-mysql php7.3-odbc \
          php7.3-opcache php7.3-pgsql php7.3-phpdbg php7.3-pspell php7.3-readline php7.3-recode \
          php7.3-snmp php7.3-soap php7.3-sqlite3 php7.3-sybase php7.3-tidy php7.3-xml \
          php7.3-xmlrpc php7.3-xsl php7.3-zip php7.4 php7.4-bcmath php7.4-bz2 php7.4-cgi \
          php7.4-cli php7.4-common php7.4-curl php7.4-dba php7.4-dev php7.4-enchant php7.4-fpm \
          php7.4-gd php7.4-gmp php7.4-imap php7.4-interbase php7.4-intl php7.4-json php7.4-ldap \
          php7.4-mbstring php7.4-mysql php7.4-odbc php7.4-opcache php7.4-pgsql php7.4-phpdbg \
          php7.4-pspell php7.4-readline php7.4-snmp php7.4-soap php7.4-sqlite3 php7.4-sybase \
          php7.4-tidy php7.4-xml php7.4-xmlrpc php7.4-xsl php7.4-zip php-amqp php-apcu \
          php-igbinary php-memcache php-memcached php-mongodb php-redis php-xdebug \
          php-zmq snmp pollinate libpq-dev postgresql-client powershell ruby-full \
          sphinxsearch subversion mongodb-org -yq >/dev/null 2>&1
        sudo apt-get autoremove -y >/dev/null 2>&1
        sudo apt-get autoclean -y >/dev/null 2>&1
        sudo rm -rf /usr/local/lib/android >/dev/null 2>&1
        docker rmi $(docker image ls -aq) >/dev/null 2>&1

If you want to keep swap and only need docker and some basics, you may want to remove a bunch of pkgs.

Thanks in my case those steps saved ~30GB in ~4mins
image