New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 868214 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Long OOO (go/where-is-mgiuca)
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , iOS , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug

Blocking:
issue 849998



Sign in to add a comment

Bookmark importer over-decodes percent-escaped sequences in URLs used as search engines

Project Member Reported by mgiuca@chromium.org, Jul 27

Issue description

Chrome Version: 68
OS: All

What steps will reproduce the problem?
(1) Go to chrome://bookmarks 
(2) Top-right menu -> Import bookmarks
(3) Choose bookmarks.html (attached to this bug report)
(4) Inspect the imported search engines in chrome://settings/searchEngines
(5) Test the search engines with Omnibox "testone query", "testtwo query", etc.

This imports seven search engine bookmarks with the following URLs:

1. http://www.example.com?q=%s&foo=bar
2. http://www.example.com/search%3Fpage?q=%s&v=foo%26bar
3. http://www.example.com/search?q=%s&v=foo%25bar
4. http://www.example.com/search?q=%s&v=pepper%25salt
5. http://www.example.com/search?q=%s&v=foo%E2%80%94bar
6. http://www.example.com/search?q=%s&v=foo—bar
7. http://www.example.com/search?q=%s&v=foo{bar}

What is the expected result?
All of the above cases import correctly, with %s being replaced with the search query, and all of the other characters in the URL remaining undisturbed (or, in the case of 6, "—" being harmlessly encoded as "%E2%80%94").

What happens instead?
The first one is a simple sanity test that works correctly.

All of the others suffer from double-decoding issues, due to the way CanImportURLAsSearchEngine works [1].

- Case 2: "search%3Fpage" is erroneously decoded as "search?page", "foo%26bar" is erroneously decoded as "foo&bar", changing the meaning of the URL.
- Case 3: "foo%25bar" is erroneously decoded as "foo%bar", creating an illegal unescaped percent sign.
- Case 4: "pepper%25salt" is erroneously decoded as "pepper%salt", causing the "%s" in "%salt" to be replaced with the search engine query! Try: "testfour query" and the browser will navigate to http://www.example.com/search?q=query&v=pepperqueryalt
- Case 5: "foo%E2%80%94bar" is decoded as "foo—bar" (this is pretty harmless).
- Case 6: "foo—bar" is left as-is instead of being encoded as "foo%E2%80%94bar" (this is pretty harmless).
- Case 7: "foo{bar}" is left as-is instead of being encoded as "foo%7Bbar%7D".

Why does this happen?
Because the import algorithm (bookmark_html_reader.cc's CanImportURLAsSearchEngine) works like this:
1. Convert the search engine string to a GURL (this happens before calling the function, which accepts a GURL). This converts "%s" into "%25s" for some reason.
2. Decode percent-escaped characters in the URL, other than control codes, spaces, path separators, and spoofable characters for some reason (it uses URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS). It seems the only reason why this happens is to reverse the accidental encoding of "%s" above, converting "%25s" back to "%s".
3. See if there is any instance of "%s" and replace it with "{searchTerms}" which is used internally for search term replacement.

So, a quick fix would be to stop using net::UnescapeURLComponent, and instead just replace "%25s" with "%s". That would fix all of the above test cases, except Case 4 which is probably rare, but totally broken. It really shouldn't be converted into a GURL at all, because "%s" is not a valid thing in a URL. However, then it wouldn't have the side effect of encoding illegal characters like '{' and '}', so Case 6 and 7 would remain broken. I'm happy to live with that though.

Note: The test suite for this function tests success/failure but doesn't actually check the return value! I'm fixing that...

Note: This is a pretty obscure bug but I am trying to fix it because I'm deleting URL_SPECIAL_CHARS_EXCEPT_PATH_SEPARATORS unescape rule in net::UnescapeURLComponent (because it is completely non-sensical); see Issue 849998.

[1] Added in https://codereview.chromium.org/616763002
 
bookmarks.html
1.2 KB View Download
CL to "mostly" fix this:
https://chromium-review.googlesource.com/c/chromium/src/+/1152723

(Doesn't fix Case 4, which turns out to be really hard.)
Project Member

Comment 2 by bugdroid1@chromium.org, Jul 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/91d296929f83952908ab549ef7c9e9e227b8447d

commit 91d296929f83952908ab549ef7c9e9e227b8447d
Author: Matt Giuca <mgiuca@chromium.org>
Date: Mon Jul 30 01:40:00 2018

Fixed (mostly) bookmark importer's over-decoding of URL percent-escapes.

Previously, all percent-encoded sequences in a search engine URL (i.e.,
containing a "%s") would be decoded, in some cases erroneously changing
the meaning of the URL or resulting in an invalid URL.

This is fixed by removing the use of URL decoding, and instead directly
replacing "%25s" with "%s". There is still one broken case: if the
source URL literally contains "%25s" (representing a literal percentage
symbol followed by an s), this will erroneously be treated as a query
substitution. (This is not a regression; it was always the case.) Fixing
this particular issue requires a major refactor.

Added proper test cases for this function.

Bug: 868214
Change-Id: I88971dc170fdc45db6633ad61b64859a19268751
Reviewed-on: https://chromium-review.googlesource.com/1152723
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Matt Giuca <mgiuca@chromium.org>
Cr-Commit-Position: refs/heads/master@{#578960}
[modify] https://crrev.com/91d296929f83952908ab549ef7c9e9e227b8447d/chrome/utility/importer/bookmark_html_reader.cc
[modify] https://crrev.com/91d296929f83952908ab549ef7c9e9e227b8447d/chrome/utility/importer/bookmark_html_reader_unittest.cc

That mostly fixes it (and removes the dependency on net::UnescapeURLComponent). The aforementioned Case 4 is fixed in an optional refactor follow-up:

https://chromium-review.googlesource.com/c/chromium/src/+/1154776

(Though this has a number of other side effects which we may or may not want. We could also just close this as Fixed now if we decide not to go ahead with this change.)

Sign in to add a comment