Sanitizing Quotes in PR Titles

I want to script our release automation to handle PR titles and descriptions. However, when a PR’s title or description contains any single or double quotes or other control characters like backticks, the shell interprets those quotes and causes the action to fail.

Here’s an example of the problem I’m talking about. If the PR title or description contains a quote, this action fails. By the time sed gets access to sanitize the input, it’s already too late and the Github Action has terminated.

Is there any mechanism I can use to sanitize the PR body and title input before the step reaches a syntax error?

name: On PR Merged into Develop

on:
  pull_request:
    types: [closed]
    branches:
      - main

jobs:
  test:
    if: github.event.pull_request.merged
    runs-on: ubuntu-latest
    steps:
      - name: strip quotes from PR body and title
        run: |
          echo 'PR_BODY<<EOF' >> $GITHUB_ENV
          echo '${{github.event.pull_request.body}}' | sed -e 's/[\"\']//g' >> $GITHUB_ENV
          echo 'EOF' >> $GITHUB_ENV
          echo 'PR_TITLE<<EOF' >> $GITHUB_ENV
          echo '${{github.event.pull_request.title}}' | sed -e 's/[\"\']//g' >> $GITHUB_ENV
          echo 'EOF' >> $GITHUB_ENV
      - name: show variables
        run: cat $GITHUB_ENV

Before anyone asks, the built-in env setters suffer from the exact same problem, which was why I was trying to sanitize input in the first place.

I don’t think there’s a built-in way to do that. What I’d do is read the file github.event_path points at with a JSON parser and sanitize the fields from there, without first having to pass the strings through the shell (or any other programming language).

For example, to get a shell-escaped version of the pull request title using Python in the PR_TITLE environment variable:

      - name: escape pull request title
        run: |
          import json
          import os
          import shlex
          with open('${{ github.event_path }}') as fh:
              event = json.load(fh)
          escaped = shlex.quote(event['pull_request']['title'])
          with open(os.environ['GITHUB_ENV'], 'a') as fh:
              print(f'PR_TITLE={escaped}', file=fh)
        shell: python3 {0}

Maybe the documentation should actually warn about directly using user-supplied data in workflows… :thinking:

1 Like

Hey, this has got me very close to a solution, but I’m seeing an issue when I echo the contents of $PR_TITLE to a file.

echo $PR_TITLE >> file.md

Results in:

'Test '"'"'single'"'"' "double"'

Should I be using something other than echo to append the variable contents to a file?

Could you share a bit more about how your workflow and the inputs look?

In most cases echo and a redirect is good to write to files, but you call it through the shell, which means any shell escapes are processed at that point. I used Python above to ensure the escapes actually make it into the file.

Sure thing.

When a pull request is merged, I’m adding the pull request title to the end of a markdown file (changelog). I want to cover bases where the PR title has single or double quotes, but also backticks.

These are the two steps doing the processing:

- name: Create Changelog Entry String
  run: |
    import json
    import os
    import shlex
    with open('${{ github.event_path }}') as fh:
        event = json.load(fh)
    escaped = shlex.quote(event['pull_request']['title'])
    with open(os.environ['GITHUB_ENV'], 'a') as fh:
        print(f'PR_TITLE={escaped}', file=fh)
  shell: python3 {0}

# Update file.
- run: |
    echo $PR_TITLE >> changelog.md

And this is converting this: Test 'single' "double"

To this in the file: 'Test '"'"'single'"'"' "double"'

I think removing the shlex.quote() method solves the issue for me!

      - name: Create Changelog Entry String
        run: |
          import json
          import os
          with open('${{ github.event_path }}') as fh:
              event = json.load(fh)
          escaped = event['pull_request']['title']
          with open(os.environ['GITHUB_ENV'], 'a') as fh:
              print(f'PR_TITLE={escaped}', file=fh)
        shell: python3 {0}
      # Update changelog.
      - run: |
          printf "%b" "$PR_TITLE  \n" >> changelog.md

This is producing the correct output. Does anything look dangerous?

It depends on what kind of escaping you need, so if you don’t need a shell-escaped version of the title that should work. But if you just want to append to the changelog, why not do it directly from the Python fragment, without the detour via GITHUB_ENV? That’d eliminate any change of interference.