[Ruby] I checked URL encoding

2 minute read

I can’t see the PDF file on the web

For some reason the uploaded PDF file had broken links, so Record the survey contents.

User environment

  • MacBook
  • Chrome

Application environment

  • AWS EC2
  • AWS S3
  • ruby 2.6.5
  • Rails 5.0.7.2 -refile (0.6.2) -refile-s3 (0.2.0)

Invisible file pattern

Files with tilde “~” in the file name will be broken.


li
  = link_to attachment_url(file, :file), target:'_balnk'
    b-icon(icon="attachment")
    span.file__name = file.file_filename

actually generated html

<a target="_balnk" href="/attachments/xxxxxxx/store/xxxxxx/2020.3%7E2020.4%E3%83%86%E3%82%B9%E3%83%88.pdf">
  <span class="file__name">2020.3~2020.4 test.pdf</span>
</a>

Japanese is also URL encoded properly, so it doesn’t seem to matter.

I checked the URL rules

The characters that can be used in the URL are defined by the rules There were some conventions, but I examined the following two.

  • RFC3986
  • RFC1738

RFC1738

https://tools.ietf.org/html/rfc1738

Unsafe: Characters can be unsafe for a number of reasons.The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters “<” and “>” are unsafe because they are used as the delimiters around URLs in free text; the quote mark (“””) is used to delimit URLs in some systems. The character “#” is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character “%” is unsafe because it is used for encodings of other characters.Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are “{“, “}”, “|”, “", “^”, “~”, “[”, “]”, and “`”. All unsafe characters must always be encoded within a URL

It was specified that the tilde must be encoded before use.

RFC3986

https://tools.ietf.org/html/rfc3986

2.3. Unreserved Characters Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde. unreserved = ALPHA / DIGIT / “-“ / “.” / “_” / “~”

The handling of tilde is “non-reserved character” and it seems to be usable without encoding.

Which rules does Amazon CloudFront comply with?

Where I looked up the URL conventions I checked the AWS environment where the WEB application is running.

If you look at the Amazon CloudFront documentation that receives requests from users, https://docs.aws.amazon.com/ja_jp/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html

If the path contains non-ASCII characters, or unsafe characters as specified in RFC 1783, URL-encode those characters

There was a description.

Why didn’t you see it?

From the above survey contents, it means that there was something wrong with the tilde encoding, HTML links created with Rails link_to are correctly encoded. ..

<a target="_balnk" href="/attachments/xxxxxxx/store/xxxxxx/2020.3%7E2020.4%E3%83%86%E3%82%B9%E3%83%88.pdf">
  <span class="file__name">2020.3~2020.4 test.pdf</span>
</a>

Browser problem?

When I tried it on Chrome and Safari, I saw a PDF file from Safari! !!

Chrome address bar display

https://xxxxxx/2020.3~2020.4 test.pdf

Safari Address Bar Display

https://xxxxxx/2020.3%7E2020.4 test.pdf

When opening the encoded link, it seems that the browser will decode it into Japanese, Is it a problem due to the decoding? .. I guess.

  • Reference materials -https://tools.ietf.org/html/rfc1738 -https://tools.ietf.org/html/rfc3986 -https://www.asahi-net.or.jp/~ax2s-kmtn/ref/uric.html -https://www.w3schools.com/tags/ref_urlencode.ASP

Tags: ,

Updated: