New issue
Advanced search Search tips

Issue 886988 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 17
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Chrome , Mac
Pri: 2
Type: Bug

Blocking:
issue 806995



Sign in to add a comment

Entity Suggestions Can Be Dup-Eliminated Erroneously

Project Member Reported by mpear...@chromium.org, Sep 19

Issue description

search_provider uses the |additional_query_params| provided by the suggest server identify entity matches.  It makes sure not to mark matches with different |additional_query_params| as duplicates of each.
https://cs.chromium.org/chromium/src/components/omnibox/browser/base_search_provider.cc?sq=package:chromium&g=0&l=439-441

This smartness doesn't happen in AutocompleteResult.  AutocompleteResult only looks at the underlying search terms, not the extra parameters, when combining duplicates.
https://cs.chromium.org/chromium/src/components/omnibox/browser/autocomplete_match.cc?type=cs&q=GURLToStrippedGURL&g=0&l=494-563

This means that we can't distinguish between entity suggestions whose query terms are the same as a non-entity search suggestion with the same query terms. The latter can appear alongside entity suggestions in the case of a verbatim suggestion or a search suggestion from the user's local history.

The effect of this issue as experienced by the user is that they will see an entity suggestion in the list as they type but as they continue to type the entity suggestion will disappear if and when it gets deduped with a non-entity search suggestion.

As an example, enable chrome://flags/#omnibox-rich-entity-suggestions and start typing "jefferson". After a few characters, the *Thomas Jefferson* entity will appear. Once the 'n' is added, the entity suggestion is deduped with the [jefferson] verbatim suggestion and the entity suggestion disappears.

Theoretically, this also could cause two entities such as *Thomas Merton* and *Merton Hanks* to be deduped against each other. But in practice, this currently doesn't happen because the backend appears to never have the same query terms associated with two different entity suggestions. In other words, the query terms will be [merton] for *Thomas Merton* and [merton hanks] for *Merton Hanks*. Never [merton] for both, even though they could be distinguished by the gs_ssp param in additional_query_params. Nevertheless, the fix described above would cover this potential case as well.
 
Summary: Entity Suggestions Can Be Dup-Eliminated Against Each Other (was: Entity Suggestions Will Be Dup-Eliminated Against Each Other)
At least the code makes it appear like that happens.  Yet, I see multiple suggestions sometimes.  Let me try to find an example where this is a problem.
Blocking: 806995
Labels: -Pri-3 Pri-2
Status: Available (was: Untriaged)
Summary: Entity Suggestions Can Be Dup-Eliminated Erroneously (was: Entity Suggestions Can Be Dup-Eliminated Against Each Other)
Here's an example [merton].  chrome://omnibox output below.

The suggest server and the Search provider are returning the verbatim suggestion for [merton] and an entity suggestion labeled as "Thomas Merton", though with the search being for "merton" itself and additional_query_params (gs_ssp=) used to distinguish between these two items.  Yet when AutocompleteResult comes along, it removes the "Thomas Merton" suggestion, as it decides that it's equivalent to the "merton" suggestion".

Given how easy this was to find--it was only my third try asking an ambiguous query--and that it removes a high-quality query suggestion, replacing it with a much lower one, I think this probably ought to be fixed before attempting to launch rich entities.

----
chrome://omnibox output:

Combined results.
Provider	Type	Relevance	Contents	URL
Search	search-what-you-typed	1300	merton	https://www.google.com/search?q=merton&oq=merton&aqs=chrome..69i57j46j0l3&sourceid=chrome&ie=UTF-8
Search	search-suggest-entity	601	Merton Hanks	https://www.google.com/search?gs_ssp=eJzj4tTP1TcwLixJtzRgBAATYAL4&q=merton+hanks&oq=merton&aqs=chrome.1.69i57j46j0l3&sourceid=chrome&ie=UTF-8
Search	search-suggest	567	merton's strain theory	https://www.google.com/search?q=merton%27s+strain+theory&oq=merton&aqs=chrome.2.69i57j46j0l3&sourceid=chrome&ie=UTF-8
Search	search-suggest	566	merton college	https://www.google.com/search?q=merton+college&oq=merton&aqs=chrome.3.69i57j46j0l3&sourceid=chrome&ie=UTF-8
Search	search-suggest	565	merton hanks dance	https://www.google.com/search?q=merton+hanks+dance&oq=merton&aqs=chrome.4.69i57j46j0l3&sourceid=chrome&ie=UTF-8

Results for individual providers.
Provider	Type	Relevance	Contents	URL
Search	search-what-you-typed	1300	merton	https://www.google.com/search?q=merton&sourceid=chrome&ie=UTF-8
Search	search-suggest-entity	1300	Thomas Merton	https://www.google.com/search?gs_ssp=eJzj4tTP1TcwNDGJTzJgBAAQ_wKa&q=merton&oq=merton&sourceid=chrome&ie=UTF-8
Search	search-suggest-entity	601	Merton Hanks	https://www.google.com/search?gs_ssp=eJzj4tTP1TcwLixJtzRgBAATYAL4&q=merton+hanks&oq=merton&sourceid=chrome&ie=UTF-8
Search	search-suggest	567	merton's strain theory	https://www.google.com/search?q=merton%27s+strain+theory&oq=merton&sourceid=chrome&ie=UTF-8
Search	search-suggest	566	merton college	https://www.google.com/search?q=merton+college&oq=merton&sourceid=chrome&ie=UTF-8
Search	search-suggest	565	merton hanks dance	https://www.google.com/search?q=merton+hanks+dance&oq=merton&sourceid=chrome&ie=UTF-8
So the issue really doesn't have anything to do with the fact that there are two "merton" entities, right? It's just that Thomas Merton gets deduped with [merton] and the latter takes precedence when really the entity suggestion should take precedence. Is that correct?
And part of the problem is that the backend is returning both with the same relevance score. You suggest in https://crbug.com/860054 that they should choose difference relevance scores, which makes sense. But in the meantime, how about we just add "is entity suggestion" as a tiebreaker?
mpearson: can you comment on my questions here? It seems to me that there's not actually an issue with deduping multiple entity suggestions.
> So the issue really doesn't have anything to do with the fact that
> there are two "merton" entities, right?
Correct.  (I didn't even notice there were two merton entities.)

> It's just that Thomas Merton gets deduped with [merton] and the
> latter takes precedence when really the entity suggestion should take
> precedence. Is that correct?
Not really.  The issue is, as you say, that Thomas Merton gets deduped
with [merton].  That's simply the issue.  As discussed on the other bug,
we shouldn't be marking them as duplicates.  (Precedence has nothing to
do with it.)

> And part of the problem is that the backend is returning both with the
> same relevance score. You suggest in https://crbug.com/860054 that
> they should choose difference relevance scores, which makes sense.
> But in the meantime, how about we just add "is entity suggestion" as
> a tiebreaker?
I agree, having them at the same relevance score is part of the problem.
I think in the meantime adding "is entity suggestion" as a tiebreaker
sounds reasonable (though definitely a bit ugly in the code).
I'm inclined to think the non-entity should win in the case of ties
(making the default behavior more conservative, so a blind "enter" is
more likely to get the current/general results), though am willing
to be convinced otherwise.

Owner: jdonnelly@chromium.org
Status: Assigned (was: Available)
I've run into a problem here, I'd be interested to know if you have any ideas on how to approach it.

The issue is that while I can add the gs_ssp parameter to the stripped URL for comparison by passing additional_query_params from ComputeStrippedDestinationURL via a new arg to GURLToStrippedGURL, it would be preferable to have GURLToStrippedGURL be able to produce the correct stripped URL for its other callers. In particular, it would be nice if tab switch could tell the two cases apart.

But to include gs_ssp in the stripped URL produced by GURLToStrippedGURL would require actually pulling out the gs_ssp parameter by name out of the URL, since there's no match object available, thus encoding this detail of the search system in our code.

I think we probably are just limited to only having the autocomplete system be aware of the differences between [merton] and [Thomas Merton]. But if you have any other ideas, let me know.
I don't have any other ideas.  In my opinion, I think having tab switch occasionally show when it would be slightly inappropriate is not a big deal. 
(e.g.., showing a "switch to tab" when the suggestion is an entity suggestion and the tab that would be switched to has a different entity or no entity at all).  I think without proof this would not be common; I think the common tab switch situation is switching to existing regular web page URLs, not search result pages.

Description: Show this description
Agreed, the tab switch case is not really an issue in practice. I just wish we could know that the stripped URL for a given URL was always going to be the same throughout the system. Not containing gs_ssp in some cases and not in others.
Anyway, fix available at https://crrev.com/c/1273620.
Project Member

Comment 13 by bugdroid1@chromium.org, Oct 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/97c78943836ffdb322cfb12f3e6e9818bee2c4a7

commit 97c78943836ffdb322cfb12f3e6e9818bee2c4a7
Author: Justin Donnelly <jdonnelly@google.com>
Date: Thu Oct 11 21:15:40 2018

[omnibox] Include additional_query_params when deduping in AcMatch.

The prevents deduping entity suggestions whose query terms are the same
as a non-entity search suggestion (e.g. verbatim suggestion or search
suggestion from the user's local history).

Bug:  886988 
Change-Id: I1b489bd938d8a2ad4dd7405bfd51679d62f8edc8
Reviewed-on: https://chromium-review.googlesource.com/c/1273620
Reviewed-by: Mark Pearson <mpearson@chromium.org>
Commit-Queue: Justin Donnelly <jdonnelly@chromium.org>
Cr-Commit-Position: refs/heads/master@{#598941}
[modify] https://crrev.com/97c78943836ffdb322cfb12f3e6e9818bee2c4a7/components/omnibox/browser/autocomplete_match.cc
[modify] https://crrev.com/97c78943836ffdb322cfb12f3e6e9818bee2c4a7/components/omnibox/browser/autocomplete_match.h

I'm going to revert the previous CL as we're going to try a different approach to entity deduping. I'll provide more details once I've validated the new approach.
Project Member

Comment 15 by bugdroid1@chromium.org, Nov 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ed72893b8391047ae34af1eb21883bd2a486aaba

commit ed72893b8391047ae34af1eb21883bd2a486aaba
Author: Justin Donnelly <jdonnelly@chromium.org>
Date: Mon Nov 19 15:38:30 2018

[omnibox] Don't include additional_query_params when deduping in AcMatch

This is basically just a revert of https://crrev.com/c/1273620 but I
wanted to keep the comment improvements I made there.

Bug:  886988 
Change-Id: Ie08bb820f9e2b6eac4b1eeb672332e2f9306fe0d
Reviewed-on: https://chromium-review.googlesource.com/c/1340787
Reviewed-by: Tommy Li <tommycli@chromium.org>
Commit-Queue: Justin Donnelly <jdonnelly@chromium.org>
Cr-Commit-Position: refs/heads/master@{#609306}
[modify] https://crrev.com/ed72893b8391047ae34af1eb21883bd2a486aaba/components/omnibox/browser/autocomplete_match.cc
[modify] https://crrev.com/ed72893b8391047ae34af1eb21883bd2a486aaba/components/omnibox/browser/autocomplete_match.h

The new approach, which is to *always* dedupe entity and equivalent non-entity suggestions appears to be working successfully. I'll be submitting it on a new issue ( bug 913299 ) in order to request merge.
Status: Fixed (was: Assigned)
Fixed, hopefully. I'll reopen this issue if we discover any problems with the new approach.

Sign in to add a comment