New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 913647 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 2
Type: Feature
Team-Security-UX



Sign in to add a comment

Show lookalike URL suggestions for approximate matches

Project Member Reported by mea...@chromium.org, Dec 10

Issue description

For lookalike URL navigation suggestions, we currently determine if two domains are similar using the skeleton comparison. This matches domains like googlé.com to google.com, but misses gooogle.com.

We should add another heuristic with approximate string matching. For starters, we can use edit distance.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Dec 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5207f23706d9f56a8f796579e011a89305d3eec5

commit 5207f23706d9f56a8f796579e011a89305d3eec5
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Tue Dec 18 02:07:14 2018

Add a binary to url_formatter to generate domain list for edit distance matching

Design document: https://docs.google.com/document/d/1IlHN996Fd5yW2vWVV1uEAGI4RZsKlNlQRDt9dxxdIpQ

A new heuristic for the Lookalike URL Navigation Suggestions feature requires edit distance computations against top domains. This CL adds a binary to generate a .cc file containing an array of skeletons of top 500 domains.

The binary excludes hostnames that are too short: If the length of the hostname excluding the registry is shorter than 5, it's not included in the list. E.g. abc.com would be dropped because the hostname excluding the registry (abc) is too short.

Bug: 913647
Change-Id: Ie66d710969a8c14f169651ae5254a249c8adc666
Reviewed-on: https://chromium-review.googlesource.com/c/1379195
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617343}
[modify] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/idn_spoof_checker.cc
[modify] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/idn_spoof_checker.h
[modify] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/top_domains/BUILD.gn
[add] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/top_domains/make_top_domain_list_for_edit_distance.cc
[modify] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/url_formatter.cc
[modify] https://crrev.com/5207f23706d9f56a8f796579e011a89305d3eec5/components/url_formatter/url_formatter.h

Cc: santhoshkumar@chromium.org
Labels: Needs-Feedback

@ meacer: Could you please provide manual reproducible steps that reproduce the isuue which helps us in verifying the issue.

Thanks.
Labels: -Type-Bug Type-Feature
santhoshkumar@: There isn't a particular bug here, I think this is better suited as a feature request. Sorry for the confusion.
Project Member

Comment 4 by bugdroid1@chromium.org, Dec 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86

commit 5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Wed Dec 19 00:57:38 2018

Add edit distance matching for lookalike URLs using top 500 domains

This is a follow up to crrev/1379195. It uses the .cc file generated in the previous CL to check if any top domains are within 1 edit distance of the navigated domain. The check is done as a linear search, as described in the design doc in the previous CL.

Bug: 913647
Change-Id: Ia78a079e786703678ef93c6f341138e15d074a6f
Reviewed-on: https://chromium-review.googlesource.com/c/1378973
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617698}
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/browser/ui/BUILD.gn
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.cc
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.h
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_browsertest.cc
[add] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_unittest.cc
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/chrome/test/BUILD.gn
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/components/url_formatter/BUILD.gn
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/components/url_formatter/top_domains/BUILD.gn
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/components/url_formatter/top_domains/make_top_domain_list_for_edit_distance.cc
[add] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/components/url_formatter/top_domains/top_domain_util.cc
[add] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/components/url_formatter/top_domains/top_domain_util.h
[modify] https://crrev.com/5b516e05f9a3b3d0a4535c2a4ad15aa0e630be86/tools/metrics/histograms/enums.xml

Project Member

Comment 5 by bugdroid1@chromium.org, Jan 8

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bfa44d228f4f4afc873ee796dbba2c72fc958582

commit bfa44d228f4f4afc873ee796dbba2c72fc958582
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Tue Jan 08 21:13:52 2019

Lookalike URLs: Exclude registry when doing edit distance comparison

The edit distance heuristic in Lookalike URLs feature records a match
if the navigated domain is one edit distance away from one of the top
500 domains. However, it takes the registry of the domains into account
as well, causing bogus matches.

As an example, it matches google.com.tw to google.com.tr (top domain),
even though the former is unlikely to be a spoofing attempt. This CL
correctly handles this case.

Bug: 913647
Change-Id: Ifa04a3f6eeccd0b97dde364d3cd2ef3d415f6ef1
Reviewed-on: https://chromium-review.googlesource.com/c/1396304
Reviewed-by: Tommy Li <tommycli@chromium.org>
Reviewed-by: Cait Phillips <caitkp@chromium.org>
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#620877}
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.cc
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_browsertest.cc
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/components/BUILD.gn
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/components/url_formatter/top_domains/BUILD.gn
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/components/url_formatter/top_domains/top_domain_util.cc
[modify] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/components/url_formatter/top_domains/top_domain_util.h
[add] https://crrev.com/bfa44d228f4f4afc873ee796dbba2c72fc958582/components/url_formatter/top_domains/top_domain_util_unittest.cc

Project Member

Comment 6 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f1674b1c476771b87be680f1f8e095018cdcbbf9

commit f1674b1c476771b87be680f1f8e095018cdcbbf9
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Fri Jan 11 19:56:14 2019

Lookalike URLs: Add a per-profile service to fetch engaged sites

We currently fetch the list of engaged sites on every navigation on the UI
thread. This is slow. We also do this separately for each tab.

This CL introduces a profile keyed service called LookalikeUrlService. This
service fetches the list of engaged sites every 5 minutes in a background thread
and stores the results until the next update. It also gets rid of the need to do
a fetch for each tab separately.

Bug: 913647
Change-Id: I9f7080c45834de576eb081243778f2f17c3e4ccd
Reviewed-on: https://chromium-review.googlesource.com/c/1389167
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Reviewed-by: Dominick Ng <dominickn@chromium.org>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Reviewed-by: Stefan Kuhne <skuhne@chromium.org>
Cr-Commit-Position: refs/heads/master@{#622111}
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/engagement/site_engagement_service.cc
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/engagement/site_engagement_service.h
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/BUILD.gn
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.cc
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.h
[modify] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_browsertest.cc
[add] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/omnibox/lookalike_url_service.cc
[add] https://crrev.com/f1674b1c476771b87be680f1f8e095018cdcbbf9/chrome/browser/ui/omnibox/lookalike_url_service.h

Project Member

Comment 7 by bugdroid1@chromium.org, Jan 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/685459774027767f9bfe748c6d494f1b82d82b88

commit 685459774027767f9bfe748c6d494f1b82d82b88
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Wed Jan 16 00:33:13 2019

Lookalike Urls: Fix edit distance when navigated domain is a top domain

The edit distance heuristic incorrectly triggers for top domains that are one
edit distance away from another top 500 domain. As a result, we show a "Did you
mean to go to" infobar for a top domain.

This CL fixes that and refactors the code so that most of the information such
as IDN conversion result and skeletons is only computed once.

Bug: 913647
Change-Id: I0efbadf3b9417ff7a122fb686397e74c0e35cf6b
Reviewed-on: https://chromium-review.googlesource.com/c/1407253
Reviewed-by: Tommy Li <tommycli@chromium.org>
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#622928}
[modify] https://crrev.com/685459774027767f9bfe748c6d494f1b82d82b88/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.cc
[modify] https://crrev.com/685459774027767f9bfe748c6d494f1b82d82b88/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.h
[modify] https://crrev.com/685459774027767f9bfe748c6d494f1b82d82b88/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_browsertest.cc

Project Member

Comment 8 by bugdroid1@chromium.org, Jan 18 (4 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d1fb85d01882236ade0cf0d2029608c3acae174

commit 6d1fb85d01882236ade0cf0d2029608c3acae174
Author: Mustafa Emre Acer <meacer@chromium.org>
Date: Fri Jan 18 19:55:12 2019

Lookalike URLs: Ignore navigations that end up as net errors.

It's possible that many of the navigations with net errors are caused
by typos instead of spoofs. These add noise to the metrics and the UI
isn't particularly useful when we already have a "Did you mean to" link
in the page (via LinkDoctor). This CL ignores such navigations.

Bug: 913647
Change-Id: Ie54011a21b78103d5827b772fb23366d28b7dc3c
Reviewed-on: https://chromium-review.googlesource.com/c/1413810
Commit-Queue: Mustafa Emre Acer <meacer@chromium.org>
Reviewed-by: Tommy Li <tommycli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#624248}
[modify] https://crrev.com/6d1fb85d01882236ade0cf0d2029608c3acae174/chrome/browser/ui/omnibox/lookalike_url_navigation_observer.cc
[modify] https://crrev.com/6d1fb85d01882236ade0cf0d2029608c3acae174/chrome/browser/ui/omnibox/lookalike_url_navigation_observer_browsertest.cc

Sign in to add a comment