Bookmark importer over-decodes percent-escaped sequences in URLs used as search engines |
|
Issue descriptionChrome Version: 68 OS: All What steps will reproduce the problem? (1) Go to chrome://bookmarks (2) Top-right menu -> Import bookmarks (3) Choose bookmarks.html (attached to this bug report) (4) Inspect the imported search engines in chrome://settings/searchEngines (5) Test the search engines with Omnibox "testone query", "testtwo query", etc. This imports seven search engine bookmarks with the following URLs: 1. http://www.example.com?q=%s&foo=bar 2. http://www.example.com/search%3Fpage?q=%s&v=foo%26bar 3. http://www.example.com/search?q=%s&v=foo%25bar 4. http://www.example.com/search?q=%s&v=pepper%25salt 5. http://www.example.com/search?q=%s&v=foo%E2%80%94bar 6. http://www.example.com/search?q=%s&v=foo—bar 7. http://www.example.com/search?q=%s&v=foo{bar} What is the expected result? All of the above cases import correctly, with %s being replaced with the search query, and all of the other characters in the URL remaining undisturbed (or, in the case of 6, "—" being harmlessly encoded as "%E2%80%94"). What happens instead? The first one is a simple sanity test that works correctly. All of the others suffer from double-decoding issues, due to the way CanImportURLAsSearchEngine works [1]. - Case 2: "search%3Fpage" is erroneously decoded as "search?page", "foo%26bar" is erroneously decoded as "foo&bar", changing the meaning of the URL. - Case 3: "foo%25bar" is erroneously decoded as "foo%bar", creating an illegal unescaped percent sign. - Case 4: "pepper%25salt" is erroneously decoded as "pepper%salt", causing the "%s" in "%salt" to be replaced with the search engine query! Try: "testfour query" and the browser will navigate to http://www.example.com/search?q=query&v=pepperqueryalt - Case 5: "foo%E2%80%94bar" is decoded as "foo—bar" (this is pretty harmless). - Case 6: "foo—bar" is left as-is instead of being encoded as "foo%E2%80%94bar" (this is pretty harmless). - Case 7: "foo{bar}" is left as-is instead of being encoded as "foo%7Bbar%7D". Why does this happen? Because the import algorithm (bookmark_html_reader.cc's CanImportURLAsSearchEngine) works like this: 1. Convert the search engine string to a GURL (this happens before calling the function, which accepts a GURL). This converts "%s" into "%25s" for some reason. 2. Decode percent-escaped characters in the URL, other than control codes, spaces, path separators, and spoofable characters for some reason (it uses URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS). It seems the only reason why this happens is to reverse the accidental encoding of "%s" above, converting "%25s" back to "%s". 3. See if there is any instance of "%s" and replace it with "{searchTerms}" which is used internally for search term replacement. So, a quick fix would be to stop using net::UnescapeURLComponent, and instead just replace "%25s" with "%s". That would fix all of the above test cases, except Case 4 which is probably rare, but totally broken. It really shouldn't be converted into a GURL at all, because "%s" is not a valid thing in a URL. However, then it wouldn't have the side effect of encoding illegal characters like '{' and '}', so Case 6 and 7 would remain broken. I'm happy to live with that though. Note: The test suite for this function tests success/failure but doesn't actually check the return value! I'm fixing that... Note: This is a pretty obscure bug but I am trying to fix it because I'm deleting URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS unescape rule in net::UnescapeURLComponent (because it is completely non-sensical); see Issue 849998. [1] Added in https://codereview.chromium.org/616763002
,
Jul 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/91d296929f83952908ab549ef7c9e9e227b8447d commit 91d296929f83952908ab549ef7c9e9e227b8447d Author: Matt Giuca <mgiuca@chromium.org> Date: Mon Jul 30 01:40:00 2018 Fixed (mostly) bookmark importer's over-decoding of URL percent-escapes. Previously, all percent-encoded sequences in a search engine URL (i.e., containing a "%s") would be decoded, in some cases erroneously changing the meaning of the URL or resulting in an invalid URL. This is fixed by removing the use of URL decoding, and instead directly replacing "%25s" with "%s". There is still one broken case: if the source URL literally contains "%25s" (representing a literal percentage symbol followed by an s), this will erroneously be treated as a query substitution. (This is not a regression; it was always the case.) Fixing this particular issue requires a major refactor. Added proper test cases for this function. Bug: 868214 Change-Id: I88971dc170fdc45db6633ad61b64859a19268751 Reviewed-on: https://chromium-review.googlesource.com/1152723 Reviewed-by: Gabriel Charette <gab@chromium.org> Commit-Queue: Matt Giuca <mgiuca@chromium.org> Cr-Commit-Position: refs/heads/master@{#578960} [modify] https://crrev.com/91d296929f83952908ab549ef7c9e9e227b8447d/chrome/utility/importer/bookmark_html_reader.cc [modify] https://crrev.com/91d296929f83952908ab549ef7c9e9e227b8447d/chrome/utility/importer/bookmark_html_reader_unittest.cc
,
Jul 30
That mostly fixes it (and removes the dependency on net::UnescapeURLComponent). The aforementioned Case 4 is fixed in an optional refactor follow-up: https://chromium-review.googlesource.com/c/chromium/src/+/1154776 (Though this has a number of other side effects which we may or may not want. We could also just close this as Fixed now if we decide not to go ahead with this change.) |
|
►
Sign in to add a comment |
|
Comment 1 by mgiuca@chromium.org
, Jul 27