Can I process only changed files with Github actions?

For example, I have texts which I want to wrap into static HTML templates.

So, for every pushed file in a source folder /in/ I want to process it, for example, with bash, writing the result with the same path/name in another folder of my repository, /out/. I imagine something like this:

name: process
on:
  push:
    paths:
    - 'in/**'
jobs:
  wrap:
    name: Wrapping a text in the template
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2 (???)
      - ??? FOR EVERY CHANGED .html FILE:
      - run: cat template/header.html >out/<PATH/NAME>.html
      - run: cat in/<PATH/NAME>.html >>out/<PATH/NAME>.html
      - run: cat template/footer.html >>out/<PATH/NAME>.html

In all the docs and StackOverflow I haven’t found how to do that. I have no thorough knowledge of github.ref or whatever should be used (if it’s viable at all?).

Could someone enlighten me?

It seems to be pretty basic and straightforward task, but maybe Actions is just not the proper instrument for that?

Thank you.

For a push event you should be able to use github.event.before and github.event.after to get the commit range that was pushed (haven’t tried, but the documentation indicates it).

From that you could use git diff --name-only to get a list of changed files. One catch: actions/checkout gets only the latest commit by default, you’ll need more. fetch-depth: 0 gets all commits.

    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
      - run: echo "CHANGED_FILES=$(git diff --name-only ${{ github.event.before }}..${{ github.event.after }})" >> $GITHUB_ENV

From there the environment variable CHANGED_FILES should contain the list of changed files.

I would remove duplicates from the file list, e.g. git log --name-only --pretty="format:" ${{ github.event.before }}..${{ github.event.after }} | sort -u

I’m pretty sure git diff already resolves duplicates internally, contrary to git log. :slightly_smiling_face:

1 Like

It does! That’s good to know and a much shorter command, which also works cross-platform (or rather cross-shell).

1 Like

A little off-topic:
There is a corner case that you probably don’t have to take into account, but it’s interesting nonetheless. If the workflow is added in the initial commit, then ${{ github.event.before }} will be 0000000000000000000000000000000000000000. This hash is invalid, it doesn’t refer to the parent of the first commit (root) at all. There is in fact a well-known SHA-1 hash for the root:

4b825dc642cb6eb9a060e54bf8d69288fbee4904

It can be calculated like so in bash:

git hash-object -t tree /dev/null

(Replace /dev/null with NUL to make it work in PowerShell and cmd)

There is another method:

git mktree </dev/null

The working directory needs to be in a git working copy for this command to work. It might take the hashing algorithm of the repository into account, but I’m not sure (Git may switch from SHA-1 to SHA-512 in the future).

Here is what I came up with to list the changed files even if the workflow is added in the initial commit (should work until the hashing algorithm changes):

git diff --name-only ${{ github.event.before == '0000000000000000000000000000000000000000' && '4b825dc642cb6eb9a060e54bf8d69288fbee4904' || github.event.before }}..${{ github.event.after }}

//edit: If you get a fatal: Invalid revision range error, then probably because the checkout action did a shallow clone. Make sure you use the following like airtower-luna showed:

      - uses: actions/checkout@v2
        with:
          fetch-depth: 0 # No shallow clone, we need all history!

It can also occur if you rewrite commits (and thus change the commit hashes) and force-push them. The before hash will be the same as the after hash from a previous run. There should still be a dangling commit to reference, but it seems like Git is unable to find it - I guess dangling commits are not fetched, at least by the checkout action or perhaps in general by Git?

1 Like