GHA tests that depend on Postgres fail intermittently:

Our GH workflow is configured to run tests automatically upon pushes.

Recently, I switched our test backend db on Github Actions from SQLite to Postgres. Since then we get errors like this, often but not every time:

psycopg2.OperationalError: could not connect to server: Connection refused
	Is the server running on host "db" (172.19.0.2) and accepting
	TCP/IP connections on port 5432?

Sometimes the jobs work the first time, or I can re-run the same failing job some hours later and it will succeed. The failures are intermittent.

The YAML for the GH workflow is here: failng-pg-jobs - Pastebin.com

I find a discussion here python 3.x - Django was unable to create a connection to the 'postgres' database and will use the default database instead - Stack Overflow where someone had a similar issue and was advised to introduce a package called wait-for-it to ensure the db was fully initialized.

I can try this, and will if I don’t find any bettter ideas, but I’m using --health-cmd pg_isready already (see the YAML above) and I’m not crazy about just throwing more tools for delay into the mix.

Anybody have ideas or suggestions?

This seems to be resolved (knock on wood).

What ended up working was using depends_on together with condition so the postgres health check was actually getting monitored by the dependent service, e.g.:

    depends_on:
      db:
        condition: service_healthy

See SO post here: django - psql not found in script for trying to control startup and shutdown order in Docker Compose - Stack Overflow

Note that this isn’t officially supported for docker compose v3, but it seems to be unofficially supported. My editor gives me a warning on the condition statement. It works fine when running in Github, so I’m content for now.