New issue
Advanced search Search tips

Issue 591981 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: 2020-01-07
OS: All
Pri: 3
Type: Feature



Sign in to add a comment

HQP should (?) return URL in which all inputs terms are mid-word matches

Reported by teo8...@gmail.com, Mar 4 2016

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

Steps to reproduce the problem:
1. I have an URL in my history that is like:
 http://www.example.com/aword-whatever-foobar-etc.html (the relevant parts being "aword" and "foobar")
2. type into the omnibox: "example.com aword foo bar"

What is the expected behavior?
I understand that partial word matches score a lot lower than matches of full words, so if there were enough other better results to show I could understand that they could fill up the proposed results and leave the matching URLS out. But since that is not the case and the url in my history IS a match, it should show up

What went wrong?
Absolutely no result shows up.

On the other hand, if I type "example.com aword foobar" or even just "example.com aword foo", the matching url does show up.

Did this work before? N/A 

Chrome version: 48.0.2564.116  Channel: n/a
OS Version: 
Flash Version: Shockwave Flash 20.0 r0
 

Comment 1 by b...@chromium.org, Mar 4 2016

Components: -UI UI>Browser>Omnibox
Labels: -OS-Linux OS-All
Owner: mpear...@chromium.org
This maybe should be duped against either  bug 591979  or  bug 592006 .
Status: WontFix (was: Unconfirmed)
Thanks for mentioning this.  We used to have mid-word matches in later parts of the URL.  We turned them off because they often looked stupid.  I know you say you want these matches to appear when there are no better suggestions.  However, we heard enough complaints from other users saying the opposite and we turned them off.  I'll take your suggestion into account and listen to hear if the general sentiment on this issue is changing.

Comment 4 by teo8...@gmail.com, Mar 4 2016

What is really irritating is when you have an url that contains, say "irishcoffee", and you write "irish coffee" and the result doesn't show up.

From a brand like Google I would expect a somewhat more "intelligent" , natural-language-oriented way of determining whether an url contains given words, than just based on splitting on certain characters and begins-with/ends-with/contains logic.

Anyway, if the problem is that mid-word matches gives garbage results, shouldn't the solution perhaps making them score lower rather than ignoring them completely?

One thing is having only a midword match and the result not showing up. Another is having a match at the start of a word PLUS a mid-word match, and that the latter PREVENTS the result from showing up while in fact it makes it a better match, not a worse one.
Cc: pkasting@chromium.org
Labels: -Type-Bug -Via-Wizard Type-Feature
Owner: ----
Status: Available (was: WontFix)
Summary: HQP should find mid-word matches outside hostname (was: Omnibox: urls in history matching partial words may have low scores, but they should be found)
I'm reopening this and morphing slightly.  This is now about re-enabling the HQP's ability to do mid-word matches outside hostnames.

I believe removing these was a mistake.  However, when re-adding them, we'll need to experiment with the scoring to find something beneficial, to make sure we don't accidentally make things worse.

Mark has said he's willing to offer advice and help interpret the results of experiments, but we still need someone willing to actually come up with a potential algorithm (maybe spelunk to find out what we used to do here, and start with that?) and write a patch to do it.  I am unlikely to get to this myself, so not taking.
Project Member

Comment 6 by sheriffbot@chromium.org, Mar 9 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been available for more than 365 days, and should be re-evaluated. Please re-triage this issue.
The Hotlist-Recharge-Cold label is applied for tracking purposes, and should not be removed after re-triaging the issue.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: mpear...@chromium.org
Mark can decide how to dupe/leave open
Status: Assigned (was: Untriaged)
I'm experimenting with this as part of my general HQP experiments.  No appropriate bug to dup it into though.
Labels: -Pri-2 Pri-3
Still experimenting.  It's looking promising.  The experiment won't return URLs that have only mid-word matches, but it will be forgiving of mid-word matches.  (Roughly, at least half the matches will have to be matches at the beginning of a word.)

Labels: Hotlist-OmniboxRanking
Project Member

Comment 11 by bugdroid1@chromium.org, Aug 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fbc743ee8228f50c12e466a440f2920f3c86d5b8

commit fbc743ee8228f50c12e466a440f2920f3c86d5b8
Author: Mark Pearson <mpearson@chromium.org>
Date: Tue Aug 01 20:08:33 2017

Omnibox - Launch Aggressively Suggest Infrequently Visited URLs

Sets default values for all relevant parameters to the ones we decided
to launch.  In particular, this change:
* makes it so that URLs that match the input--even if only visited
  once or twice--are likely to score above low quality query suggestions
  that come from the server.
* boosts a URL suggestion if it appears the URL suggestion is clearly
  seeking that URL.  In particular, if the omnibox input only matches that
  single URL from history, it gets a 3x boost (in effect we count it as
  having three times as many visits).  This boost decreases as the number of
  matching URLs increases, so that if the user input matches five or more
  items from history, nothing gets a boost.
* lowers the threshold for how well a URL must match the input in order
  to be displayed.  Previously, for example, we wouldn't return URLs that
  match a word in the input if the word matches in the ?query or #hash
  section of the URL.  Now we do.
* reduces the relative weight of a "typed visit" (a time the URL is selected
  from the omnibox) compared with a regular visit (click on a link).
  It used to be that the former was worth 20x the latter.  Now it's only
  1.5x.
* changes to a scoring model in which additional visits to a URL are
  guaranteed to increase its score.  Previously we used a model based on
  the average quality of a visit, which means that if a URL has many
  typed visits and then gets a new untyped visit, its score (the average)
  will go down.  Now we use simply a sum, which means the score will
  definitely increase.

Precisely, in terms of code / config, we're launching the following settings:
  "HQPExperimentalScoringBuckets": "0.0:550,1:625,9.0:1300,90.0:1399",
  "HQPTypedValue": "1.5",
  "HQPFreqencyUsesSum": "true",
  "HQPNumMatchesScores": "1:3,2:2.5,3:2,4:1.5",
  "HQPExperimentalScoringTopicalityThreshold": "0.5"

In the process, removes some of the flags for frequency scoring that
I don't think are useful (not the right model for scoring) and aren't
worth going back to.

Bug: 695560,  327085 ,  369989 , 508262,  580688 , 591981,  598184 
Change-Id: Id349c5aaa2e09e6b5284c55fc5790f4b14b8fa7b
Reviewed-on: https://chromium-review.googlesource.com/585377
Commit-Queue: Mark Pearson <mpearson@chromium.org>
Reviewed-by: Peter Kasting <pkasting@chromium.org>
Cr-Commit-Position: refs/heads/master@{#491089}
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/history_quick_provider_unittest.cc
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/in_memory_url_index_unittest.cc
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/omnibox_field_trial.cc
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/omnibox_field_trial.h
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/scored_history_match.cc
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/scored_history_match.h
[modify] https://crrev.com/fbc743ee8228f50c12e466a440f2920f3c86d5b8/components/omnibox/browser/scored_history_match_unittest.cc

NextAction: 2018-01-09
Owner: ----
Status: Available (was: Assigned)
This change helped but does not fix the issue.  The omnibox still doesn't allow mid-word matches, so if one of the omnibox input words only matches a URL in those parts, the URL is not returned.  The change listed above made the omnibox more willing to return suggestions that only match in the /path part of the URL, but those matches have to be at the beginning of a word boundary.

I'm going to leave this bug open, though will not likely work on it for a while unless it appears to be a widespread problem for users.
Cc: a-...@yandex-team.ru mpear...@chromium.org
 Issue 789506  has been merged into this issue.
Summary: HQP should (?) return URL in which all inputs terms are mid-word matches (was: HQP should find mid-word matches outside hostname)
Rephrasing summary, as the behave changed recently.

I *think* the current behavior is if you have a multi-word omnibox input, and at least one term matches at a word boundary, then the URL will be returned.  If all terms matches only in the middle of strings, then the URL will not be returned.
The NextAction date has arrived: 2018-01-09
Cc: jdonnelly@chromium.org
NextAction: 2019-01-08
I think the current state is reasonable.  CC jdonnelly@, because he's thinking about fuzzy matching and this falls into that vein.
The NextAction date has arrived: 2019-01-08
Cc: skare@chromium.org
NextAction: 2020-01-07
CC skare@ as an FYI.  He's thinking about fuzzy matching these days.

Sign in to add a comment