Any way to distinguish branch name part from file pat within a url?

Here is my case:

The input is a URL “Shows a file” (from https://github.com/gitbucket/gitbucket/wiki/List-of-URL-Formats).

Example:

https://github.com/BCN3D/BCN3D-Moveo/blob/master/STL%20files/Electronics%20box/Box.STL

I want to distinguish the actual file path part from a branch name. So, ‘master’ from ‘STL%20files/Electronics%20box/Box.STL’ in this case.

The problem is that branch name can have slashes in it. Example:

https://github.com/soswow/react-circle/blob/custom/branch/demo.gif

In this case, branch name is ‘custom/branch’ and path is demo.gif

The resulting path/branch is going to be used in an API call:

https://developer.github.com/v3/repos/contents/#get-contents

/repos/:owner/:repo/contents/:path

Where ‘:path’ just a file path only, and branch name is provided via ‘ref’ parameter.

The way we do it right now is by iteratively removing parts from everything after ‘blob/’ (in the incoming URL) waiting for non-404 response. This feels wasteful. If someone has a branch name with many slashes - we will make many useless calls.

Is there a way?

It’s been solid 3 months since I asked that. Not sure if bumping old unanswered question is ok here, but here I go

I can’t think of a way to know definitively, but it seems to me that you’re going about checking URLs the wrong way. The way I would do it is check the first string after blob:

  1. Whether or not you get a non-404, check the second one.
  2. If the first was non-404, and this one is 404, then the first one was your branch name.
  3. If the first was non-404 and this one is also non-404, check the third one, etc.
  4. If the first one was 404 and this one is non-404, check the third one in the same way as in step 2.

Start from the front, not the back. You’ll always have at least 2 checks, but you’ll rarely have 3, and even more rarely more than that.

You may even consider parsing the two strings first, and if it’s a 404, THEN confirming that with the first one you get a 404.

Also, if you know you’re always getting a file, then the branch name is never going to be the last string, right? So you never have to check that one.

So in an example 1/2/3/4/5.gif where /blob/ precedes 1, I would consider checking 1/2 first. If it’s 404, then go to 1. If that’s 404, then go to 1/2/3. If 1/2 was non-404, check 1/2/3 for good measure.

There are some edge cases that would break this e.g. if there are two branches master and master/custom/demo . But if this edge case is possible/worrisome for you, then the way you’re doing it is probably the only way.