I have a GitHub Pages site https://opendata-mvcr.github.io/ generated from GitHub - opendata-mvcr/opendata-mvcr.github.io: Web data.gov.cz. Tento repozitář je udržován v rámci projektu CZ.03.4.74/0.0/0.0/15_025/0013983.. The problem is that when Pages is set to be generated from the master branch, everything works OK. When I set it to be generated from “feature/online-školení” branch, it gets generated once, but when I push a commit to that branch, the site is not re-generated. I need to go to the Pages settings, switch the branch to e.g. master, wait for the pages to be regenerated, and it switch back to the “feature/online-školení” branch to get it to generate. This is quite frustrating. Could it be connected to the branch name containing a unicode character, somehow breaking the hook to regenerate Pages on commit?
I’m not 100% sure about this, but in my personal experience paths with non-ASCII characters tend to create all sorts of unexpected problems, so my guess is that this would apply to branch names too. The
í are most likely the culprits (
š is U+0161, and not part of Latin1). Any GHPages backend script might have trouble dealing with such characters in paths/branches, as they’ll probably be either escaped or lost in the process.
I’ve experienced many similar problems with filenames using accented letters which are part of Latin1, due to different tools representing them differently along the toolchain.
PS. I love your GH org, and have relied on it for data CVSs many times! (e.g. language codes tables, etc.). Keep up the good work.
This is odd.
I see a bunch of successful Pages deployments (the green checks next to the commits) for the
feature/online-školení branch: Commits · opendata-mvcr/opendata-mvcr.github.io · GitHub. Propagation to the CDN can take up to 10 minutes. Could you maybe have tried to access the content before the CDN had time to catch up and believe no deployments happened?
Two were missed recently but this could have happened while the Pages site was tracking a different branch.
The branch name should not matter and Pages anyway should support your encoding.
I will follow up on Monday!
@tajmone That is my suspicion too, will have to try with another branch with no unicode characters though. Nevertheless, this would be a bug, because outside of US an UK, unicode is used quite frequently.
@yoannchaudet The successful deployments are those after manual branch switch. Whenever I need the branch to build, I visit the Pages settings, switch to “master” and keep refreshing the page until the status changes from “is ready to be published on …” to “is published on” and then switch it back to “feature/online-školení” and wait again.
When I only commit to “feature/online-školení” when it is set as the gh-pages source, the build never happens - those are the missed commits.
Yes, but when it comes to Git it’s a known issue. For example, it turned out that Git would create quite some disasters on macOS due to Unicode normalization differences:
Now there’s a dedicated setting in Git for macOS to avoid this, but many of the Git related tools (native or otherwise) might suffer from similar issues when it comes to handle Unicode filenames.
E.g. I wrote I CI script to filter the files that should be validated by EClint (a Node.js tool for validating EditorConfig style conventions), i.e. by feeding only files which are actually tracked, instead of every files in the repository (which include generated files that are ignored). It seemed to work fine, until I started using filenames with Latin1 accents, and discovered that EClint would complaint that the files didn’t exist. It turned out that non-ASCII characters (including ISO-88591/Latin1) were being escaped as ASCII codes, which the Node.js didn’t unescape.
Even if Bash and other shells are Unicode enabled by default, the WildMatch algorithm still handles ASCII only chars:
So, yes, I think that non-ASCII chars are going to be unsafe to use in file- and branch-names, especially when Git commands and third party command line tools are involved, for there are no guarantees that escaped characters in file-names representation being passed around will be handled correctly, and there’s also the problem of Unicode normalization in some cases (as seen on macOS vs Git problem).
@jakubklimek I see you renamed the branch that is tracking your Pages site. I did not see a problem on my side when trying to reproduce the issue (including with your exact branch you were using previously). Preemptively, I cleared potential locks for your Pages site. If it was not building on new commits before, it should now
How does that work? what are these potential locks?
As a side note, I lately noticed that when enabling GHPages (served from
docs/ folder) for a repository that already had all HTML contents ready in
docs/ (static website, with
.nojekyll) the GHPages website did not show up until I pushed a new commit (even if it didn’t involve changes in the
docs/ folder, actually just force pushing the last commit as it was enough to make the website visible). Was that related to the “potential locks” you mentioned?
@tajmone I am leaking internal details here! Pages does only one build per repo at a given time. We have an internal lock mechanism to enforce this and have lately seen (very infrequently) dead locks in the system. This is something we are addressing this quarter. If you find your self stuck on a specific repo, by all means, open a support ticket directly, that will get our attention and we will be able to mitigate shortly.