Jobs being skipped while using both `needs` and `if`. Unclear why.

Alright, I’m going slightly mad here.

Background :

I have a monorepo with several different services that have their own deploy jobs. Right now, these are completely different workflows that trigger on different path globs, but we’ve had timing issues where a merge involves changes to both service_a and service_b (or _c, or _d). In our repo, service _b thru _d rely on service_a, but service_a’s workflow sometimes takes longer than the other services, so by the time we get to running deploy-phase tests against _b thru _d, they fail because they’re hitting the previous version of _a.

So I’m making changes to use a single workflow with each deploy as a separate job, and attempting to use the workflow to manage whether to run or skip each job, and hold _b thru _d until _a is done.

Workflow :

So essentially, I have this

name: deploy-dev

on:
  push:
    branches:
      - develop
    paths:
      - 'service_a/**'
      - 'service_b/**'
      - 'service_c/**'
      - 'service_d/**'
      - 'platform-client/**'
      - 'platform-shared/**'
      - 'platform-tests/**'

jobs:
  gather_diffs:
    runs-on: ubuntu-latest
    outputs:
      all: ${{ steps.shared.outputs.matches }}
      service_a: ${{ steps.service_a.outputs.matches }}
      service_b: ${{ steps.service_b.outputs.matches }}
      service_c: ${{ steps.service_c.outputs.matches }}
      service_d: ${{ steps.service_d.outputs.matches }}
    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          fetch-depth: 2
      - name: Check Shared
        id: shared
        run: |
          FILE_COUNT=$(git diff HEAD^ --name-only -- platform-shared/* | wc -l)
          if [[$FILE_COUNT -ne 0]]; then echo "::set-output name=matches::true"; fi;
      - name: Check service_a
        id: service_a
        run: |
          FILE_COUNT=$(git diff HEAD^ --name-only -- service_a/* platform-client/src/service_a/* platform-tests/test/service_a* | wc -l)
          if [[$FILE_COUNT -ne 0]]; then echo "::set-output name=matches::true"; fi;
      - name: Check service_b
        id: service_b
        run: |
          FILE_COUNT=$(git diff HEAD^ --name-only -- service_b/* platform-client/src/service_b/* platform-tests/test/service_b* | wc -l)
          if [[$FILE_COUNT -ne 0]]; then echo "::set-output name=matches::true"; fi;
      - name: Check service_c
        id: service_c
        run: |
          FILE_COUNT=$(git diff HEAD^ --name-only -- service_c/* platform-client/src/service_c/* platform-tests/test/service_c* | wc -l)
          if [[$FILE_COUNT -ne 0]]; then echo "::set-output name=matches::true"; fi;
      - name: Check service_d
        id: service_d
        run: |
          FILE_COUNT=$(git diff HEAD^ --name-only -- service_d/* platform-tests/test/service_d* | wc -l)
          if [[$FILE_COUNT -ne 0]]; then echo "::set-output name=matches::true"; fi;
  service_a:
    runs-on: ubuntu-latest
    needs: gather_diffs
    if: needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_a
    steps:
      - run: echo "service_a will deploy"
  service_b:
    runs-on: ubuntu-latest
    needs: [gather_diffs, service_a]
    if: needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_b
    steps:
      - run: echo "service_b will deploy"
  service_c:
    runs-on: ubuntu-latest
    needs: [gather_diffs, service_a]
    if: needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_c
    steps:
      - run: echo "service_c will deploy"
  service_d:
    runs-on: ubuntu-latest
    needs: [gather_diffs, service_a]
    if: needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_d
    steps:
      - run: echo "service_d will deploy"

The Problem :
The gather-diffs job works as intended. If diffs exist on the paths each service needs (or on the shared paths), the appropriate outputs get set and the rest don’t. The gather-diffs job always completes successfully. However, any time the service_a job is skipped, all the dependents are skipped as well.

If I tweak the service-specific jobs to remove service_a from the needs and echo the individual outputs from ${{needs.gather_diffs.outputs.<output>}}, it prints what I’m expecting.

If I change the jobs to use if: always() and then use the original if on each step, it works fine…

service_b:
    runs-on: ubuntu-latest
    needs: [gather_diffs, service_a]
    if: always()
    steps:
      - run: echo "service_b will deploy"
        if: needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_b

… however this means I now need to add this if condition to every step in the job, which is sub-optimal.

Because this second form works and uses the exact same if condition (just in a different location), I’m stumped here. It appears that the job-level if does indeed override the behavior of skipping a job when a needed job is skipped… and it appears that the condition I’m using to check outputs is returning as expected and allowing the step to run when applied to it.

So I’m stumped. Is there something I’m missing that would allow me to to have the output-based condition at the job level?

Hi @joequincy , 

I am a little confused about your problem and requirement . 

Is your root problem job service_a always skipped,  but you don’t know why ?   

Please try to change your if contional to 

if: " needs.gather_diffs.outputs.all == 'true' || needs.gather_diffs.outputs.service_a == 'true' "

Because the set-out command creates a string type variable not a boolean type variable. 

And you could enable step debug logs to see the job level output variable in Complete job step of job gather_diffs. 

Thank you, I’ll try turning on the debug logs.

The issue is not that service_a is always skipped (it’s not), but rather that if service_a is skipped, then all other jobs that needs service_a will also be skipped, regardless of how their if conditional evaluates.

The set-out command creates a string type, but if it isn’t used, the output will be null… so the conditional is functionally 

(either string ‘true’ or null) || (either string ‘true’ or null)

To be clear, this works  if applied at the step level if (as shown at the bottom of the original question). If there are diffs in just service_b, the echo steps for service_c and service_d are skipped as expected using this conditional on those steps.

And when service_a is not skipped, this works at the job level for service _b thru _d. It’s only when service_a is skipped that suddenly I need the _b thru _d jobs to have if: always().

This seems like Github Actions is behaving incorrectly, as it doesn’t align with the documentation, which states that (emphasis added)

If a job fails, all jobs that need it are skipped unless the jobs use a conditional statement that causes the job to continue.

From what I’m observing of its actual behavior, it seems that somehow the skipped service_a job is affecting how the conditional on service _b thru _d is being evaluated, so even though it normally evaluates to true, if the service_a job is skipped it instead evaluates to false… which does not align with the documentation.

After enabling both ACTIONS_STEP_DEBUG and ACTIONS_RUNNER_DEBUG, there are no additional logs to help determine why the job is being skipped. When I click on the job, I simply get

Through further experimentation, I discovered that I can set things up like

service_b:
    runs-on: ubuntu-latest
    needs: [gather_diffs, service_a]
    if: always() && (needs.gather_diffs.outputs.all || needs.gather_diffs.outputs.service_b)
    steps:
      - run: echo "service_b will deploy"

… which takes away the need to declare the if on every step of the job. This is a sufficient workaround, but even more so makes it feel like it’s a flaw in how Github Actions is processing the if conditional when a prior job has been skipped. I’m going to go ahead and report this as a bug.

If you didn’t specify always() in if conditional, in default, it will automatically result with success(). 

You could refer to this document : https://help.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#job-status-check-functions 

If service_a is skipped , then all jobs needs service_a will be skipped. I am afraid that I have to say this is an expected behavior not a bug. 

1 Like