Tracking bug: URL rendering code is being used for converting between URLs and filenames |
||||||||||
Issue descriptionRoot cause: Lots of code in Chrome is using net::escape logic designed *exclusively* for safely displaying URLs to users, for an entirely incorrect purpose, which is converting between URLs (mostly file: URLs, but also ftp:, externalfile:, etc) and file system paths/filenames. The problem is that certain characters are *blacklisted* (in net/base/escape.cc) from being decoded because of spoofing concerns (e.g., U+1F512 '🔒' because it looks like a padlock, U+3000 ' ' because it's whitespace). However, that's a purely visual problem. They shouldn't be prevented from being decoded when converting from a URL to a filename. This has been a problem for a long time, but it is slowly getting worse as we add new characters to the blacklist for visual spoofing reasons (e.g., U+3000 was recently added in r551260), and this in turn breaks code that has nothing to do with visual URL display. That means as long as this practice is common, it's difficult to add new spoof characters to the blacklist, because doing so increases the breakage. An example of this is Issue 585422 , where U+1F512 '🔒' was added to the blacklist, and suddenly "file:///tmp/Lock🔒" could not be opened in Chrome because FileURLToFilePath refused to decode 🔒 when looking up the file in the user's local file system. Other relevant issues are marked as blockers on this issue. There are two UnescapeRules (in net/base/escape.h) that are indicative of this problem: - URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS, which unescapes a whole lot of reserved URL characters including '%', '&' and '#', and almost all Unicode characters, but does not unescape blacklisted Unicode characters. Use of this is quite widespread (including, notably, in FileURLToFilePath, but I am removing that soon). - NONASCII_SPACES (added as a temporary work-around for Issue 847039 ), which unescapes blacklisted space characters, but not any of the other blacklisted characters. (Essentially, this is targeted at reversing the regression introduced by r551260, but not fixing the wider issue.) I think *every single usage* of these unescape rules indicates a bug. Neither of these are safe to use in visual URL displays, they both seem to be used for non-visual purposes. Thus, it is nonsensical to decode all Unicode characters *but* certain visually spoofable characters. If the output of the decode is *not a URL* (e.g., it is a file path, filename, bare string, etc), then it isn't meaningful to leave any characters percent-encoded. The correct behaviour in most of these cases is likely to use UnescapeBinaryURLComponent, which decodes literally everything. However, it can result in non-UTF-8 strings. It's possible we want a decoder which replaces invalid UTF-8 sequences with U+FFFD ('�') or returns an error. ⛆ |
|
|
,
Jun 7 2018
I've written a detailed doc about this that's a bit easier to keep up-to-date than a bug report. https://docs.google.com/document/d/18LkRR5ZMDDYvhg4VTU9If5O-U5JLGc51UAHYwiPvumk/edit (Note: This is on the chromium.org domain, not Google.)
,
Jun 7 2018
,
Jul 11
,
Jul 17
,
Jul 27
,
Jul 27
I've put up a WIP CL: https://chromium-review.googlesource.com/c/chromium/src/+/1152629 This does a whole lot of work (and probably should be split up), mostly updating all the non-display usages of UnescapeURLComponent to decode everything. I've individually considered each case and confirmed that they are not being used for display (and therefore should not be preserving spoofable characters). The rough plan is now: 1. Wait for https://crrev.com/c/1142854 to land (reverting the addition of NONASCII_SPACES rule). 2. Add new rule ALL_EXCEPT_SPACES_AND_PATH_SEPARATORS and mark URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS as deprecated. 3. Convert all usages of URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS to use ALL_EXCEPT_SPACES_AND_PATH_SEPARATORS instead (and in several cases, also add SPACES and/or PATH_SEPARATORS, since most of these cases have no reason to exclude those, even if they're unexpected in the input). 4. Convert most usages of the above to UnescapeBinaryURLComponent. I could perhaps combine 3 and 4 into the same CL since otherwise there will be 2 large CLs touching dozens of files. I also think it's prudent to rename UnescapeURLComponent to UnescapeURLForDisplay at some point (either before or after). Doing it before would highlight the absurdity of all this code using "UnescapeURLForDisplay" for non-display purposes, so maybe I should do that first to get it out of the way. That way, the CL which changes all of these to use UnescapeBinaryURLComponent is more obvious why it's needed. The main thing I'm concerned about is that by converting to use ALL_EXCEPT_SPACES_AND_PATH_SEPARATORS / UnescapeBinaryURLComponent, we are allowing "%00" to be decoded into a NUL byte, which may or may not be correct depending on context, so we need to consider these more carefully.
,
Jul 31
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/37ec335ea923b4004d10593f13ce009c96850607 commit 37ec335ea923b4004d10593f13ce009c96850607 Author: Tatsuhisa Yamaguchi <yamaguchi@chromium.org> Date: Tue Jul 31 20:07:40 2018 Unescape all escaped characters in externalfile: URL. Resolve error opening files whose name contain blacklisted characters for URL spoofing, such as "🔒" (U+1F512). This change also removes the tentative hack at crrev.com/c/1085207 for crbug.com/847039 . Bug: 847039 ,849998 Test: unit_tests --gtest_filter=ExternalFileURLUtilTest.*:FileManagerPath.* Test: net_unittests --gtest_filter=EscapeTest.* Change-Id: Icd8ba4c73f09b97e05c9af182d338e709d5876f6 Reviewed-on: https://chromium-review.googlesource.com/1142854 Commit-Queue: Asanka Herath <asanka@chromium.org> Reviewed-by: Asanka Herath <asanka@chromium.org> Reviewed-by: Matt Giuca <mgiuca@chromium.org> Cr-Commit-Position: refs/heads/master@{#579543} [modify] https://crrev.com/37ec335ea923b4004d10593f13ce009c96850607/chrome/browser/chromeos/fileapi/external_file_url_util.cc [modify] https://crrev.com/37ec335ea923b4004d10593f13ce009c96850607/chrome/browser/chromeos/fileapi/external_file_url_util_unittest.cc [modify] https://crrev.com/37ec335ea923b4004d10593f13ce009c96850607/net/base/escape.cc [modify] https://crrev.com/37ec335ea923b4004d10593f13ce009c96850607/net/base/escape.h [modify] https://crrev.com/37ec335ea923b4004d10593f13ce009c96850607/net/base/escape_unittest.cc
,
Aug 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7 commit 2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7 Author: Tatsuhisa Yamaguchi <yamaguchi@chromium.org> Date: Fri Aug 10 01:43:09 2018 Unescape all escaped characters in externalfile: URL. Resolve error opening files whose name contain blacklisted characters for URL spoofing, such as "🔒" (U+1F512). This change also removes the tentative hack at crrev.com/c/1085207 for crbug.com/847039 . Bug: 847039 ,849998 Test: unit_tests --gtest_filter=ExternalFileURLUtilTest.*:FileManagerPath.* Test: net_unittests --gtest_filter=EscapeTest.* Change-Id: Icd8ba4c73f09b97e05c9af182d338e709d5876f6 Reviewed-on: https://chromium-review.googlesource.com/1142854 Commit-Queue: Asanka Herath <asanka@chromium.org> Reviewed-by: Asanka Herath <asanka@chromium.org> Reviewed-by: Matt Giuca <mgiuca@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#579543}(cherry picked from commit 37ec335ea923b4004d10593f13ce009c96850607) Reviewed-on: https://chromium-review.googlesource.com/1170362 Reviewed-by: Tatsuhisa Yamaguchi <yamaguchi@chromium.org> Cr-Commit-Position: refs/branch-heads/3497@{#532} Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753} [modify] https://crrev.com/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7/chrome/browser/chromeos/fileapi/external_file_url_util.cc [modify] https://crrev.com/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7/chrome/browser/chromeos/fileapi/external_file_url_util_unittest.cc [modify] https://crrev.com/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7/net/base/escape.cc [modify] https://crrev.com/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7/net/base/escape.h [modify] https://crrev.com/2b721e07201aecd7d0eb4f5596a4d8ca1cf5d3d7/net/base/escape_unittest.cc
,
Sep 7
The general plan of record is now: 1. Add new UnescapeRule ALL_EXCEPT_CONTROL_SPACES_AND_PATH_SEPARATORS. https://chromium-review.googlesource.com/c/chromium/src/+/1152629 This will not unescape control chars (it's too risky, as discussed on that CL). 2. Update all the code to use ALL_EXCEPT_CONTROL_SPACES_AND_PATH_SEPARATORS. https://chromium-review.googlesource.com/c/chromium/src/+/1212430 Thus fixing the most critical of these issues (not decoding those spoofing characters) across the entire codebase. But still keeping control characters. 3. Delete URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS. 4. Carefully go through and update as many usages as possible to use UnescapeBinaryURLComponent. |
|||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by mgiuca@chromium.org
, Jun 6 2018