New issue
Advanced search Search tips

Issue 878138 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 696822



Sign in to add a comment

Declarative Net Request: Clarify/fix behavior for patterns/urls with non-ascii characters.

Project Member Reported by karandeepb@chromium.org, Aug 27

Issue description

We should document how urls with non-ascii characters in them will be matched and whether non-ascii characters in filter lists are supported.
 
url_pattern_index compares |urlFilter| with url.possibly_invalid_spec() (which consists of only ascii characters, with the host in punycode format if necessary and other non-ascii characters are percent escaped. See UrlPattern::MatchesURL. Similarly the url_pattern_index code compares the request origin, assuming the rule domain is in a canonical form.

We can't automatically convert some arbitrary urlFilter with non-ascii chars to a suitable format since we can't determine whether a particular character is part of the host, path etc. Hence I'd restrict the urlFilter to only be composed of ascii characters (documenting how non-ascii bits should be encoded).

To maintain consistency, I'd do the same for "domains" and "excludedDomains".
+Charlie, since I think this "may" be a bug for subresource filter. Basically any non-ascii character in a UrlPatternIndex rule won't be compared correctly by url_pattern_index unless we do suitable preprocessing. See the above comment. Not sure how prevalent these are though. 
Cc: csharrison@chromium.org
This sounds good to me. Would you mind including me on a review where you make this change?

I don't think there are many of these rules in e.g. EasyList, but it's good to lock this down.
Project Member

Comment 5 by bugdroid1@chromium.org, Aug 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19

commit 32ff845fd2d07b4bcac5d4f090658dc1c16dcb19
Author: Karan Bhatia <karandeepb@chromium.org>
Date: Thu Aug 30 01:23:17 2018

Declarative Net Request: Disallow non-ascii chars in url patterns and domains.

While comparing a rule against a url, url_pattern_index assumes that the
provided pattern and domains are in canonical form (internationalized domains in
punycode and non-ascii characters in percent-escaped utf8). Document this
expectation and disallow rules where the "urlFilter", "domains" and
"excludedDomains" properties consist of non-ascii characters.

BUG= 878138 

Change-Id: I0b23200349a1aee6ed13adb09f61621ab235917d
Reviewed-on: https://chromium-review.googlesource.com/1195095
Commit-Queue: Karan Bhatia <karandeepb@chromium.org>
Reviewed-by: Istiaque Ahmed <lazyboy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#587363}
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/chrome/browser/extensions/api/declarative_net_request/declarative_net_request_browsertest.cc
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/browser/api/declarative_net_request/constants.cc
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/browser/api/declarative_net_request/constants.h
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/browser/api/declarative_net_request/indexed_rule.cc
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/browser/api/declarative_net_request/indexed_rule_unittest.cc
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/browser/api/declarative_net_request/parse_info.cc
[modify] https://crrev.com/32ff845fd2d07b4bcac5d4f090658dc1c16dcb19/extensions/common/api/declarative_net_request.idl

Status: Fixed (was: Assigned)

Sign in to add a comment