Support inexact path matching when extracting search terms from URL
Reported by
vit...@yandex-team.ru,
May 12 2016
|
|||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0 Steps to reproduce the problem: I tried to add Google.Maps as a separate search engine. Examples of URLs used by Google.Maps: https://www.google.com/maps/search/moscow/@55.7498598,37.352322,10z?hl=en https://www.google.com/maps/search/paris/@48.8589507,2.2775175,12z?hl=en https://www.google.com/maps/place/Moscow,+Russia/@55.7498598,37.352322,10z/data=!4m5!3m4!1s0x46b54afc73d4b0c9:0x3d44d6cc5757cf4c!8m2!3d55.755826!4d37.6173?hl=en https://www.google.com/maps/place/Paris,+France/@48.8589507,2.2775175,12z/data=!3m1!4b1!4m5!3m4!1s0x47e66e1f06e2b70f:0x40b82c3688c9460!8m2!3d48.856614!4d2.3522219?hl=en Now I want to define the search engine in prepopulated_engines.json with making possible search terms extractions. The currently implemented algorithm requires pathes which are completely equal (see https://code.google.com/p/chromium/codesearch#chromium/src/components/search_engines/template_url.cc&sq=package:chromium&type=cs&l=498). It's inappropriate for Google.Maps where the path is used to transfer additional parameters (geographic coordinates I guess). What is the expected behavior? If we had a method to ignore path's ending then the definition of Google.Maps could look like this: "google_maps": { "name": "Google.Maps", "keyword": "maps.google.com", "favicon_url": "http://maps.google.com/favicon.ico", "search_url": "https://www.google.com/maps/search/{searchTerms}", "alternate_urls": [ "https://www.google.com/maps/search/{searchTerms}/{google:ignorePathEnding}", "https://www.google.com/maps/place/{searchTerms}/{google:ignorePathEnding}" ], "id": 1000 } What went wrong? Unable to use inexact path matching when extracting search terms from URL. Did this work before? N/A Chrome version: <Copy from: 'about:version'> Channel: n/a OS Version: 6.1 (Windows 7, Windows Server 2008 R2) Flash Version: Shockwave Flash 21.0 r0
,
May 12 2016
Google Maps is not a general-purpose search engine, so we would not ship it in prepopulated_engines.json. I know the Maps folks have also added an OSDD for Maps (finally!) though I don't know whether it's live on the web yet or not. Frankly, I consider URLs like in comment 0 broken; the Maps folks should be using query params and not path elements to do this sort of thing. To me, this is "file a bug against Google Maps" territory. Given all the above, I don't think we should implement this. I can reopen if there are good reasons I've missed :)
,
May 12 2016
Yes, there is an OSDD for Google Maps: https://www.google.com/maps/preview/opensearch.xml And I know other examples where "{google:ignorePathEnding}" could be useful: 1) Amazon: Search "tetris" http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=tetris&rh=i%3Aaps%2Ck%3Atetris Search "winnie the pooh" http://www.amazon.com/s/ref=nb_sb_ss_c_0_4?url=search-alias%3Daps&field-keywords=winnie+the+pooh&sprefix=winnie+the+pooh%2Caps%2C316&rh=i%3Aaps%2Ck%3Awinnie+the+pooh Search "vacuum cleaner repair" http://www.amazon.com/s/ref=nb_sb_ss_i_2_10?url=search-alias%3Dstripbooks&field-keywords=vacuum+cleaner+repair&sprefix=vacuum+cle%2Caps%2C352&rh=n%3A283155%2Ck%3Avacuum+cleaner+repair 2) Yandex.Maps: Search "moscow" https://yandex.com/maps/213/moscow/?text=moscow&sll=44.537816%2C48.726606&sspn=132.363281%2C48.118616&ol=geo Search "paris" https://yandex.com/maps/10502/paris/?text=paris&sll=37.646961%2C55.725045&sspn=2.068176%2C0.637864&ol=geo 3) Yandex.Market: Search "iphone" https://market.yandex.ru/catalog/54726/list?text=iphone Search "shure" https://market.yandex.ru/catalog/56179/list?text=shure
,
May 13 2016
I would be willing to consider something like this if we could find a way to make it both more generic and more specific in terms of what we ignore. What I mean by that is that "ignore path ending" can't handle ignoring parts of the path other than the ending, and it can't ignore specific pieces of the URL known to be useless. If someone does this for example: http://search.com/uselesscrap/searchterm ...then we're hosed, because the search term is part of the path, but the section before it is stuff we want to ignore. Of course, the ultimate rocket launcher to bring to bear on this stuff would be regexes, but OSDDs (with which we're trying to remain compatible) can't support those at all (and I'm not terribly keen on allowing people to put in arbitrary regexes as specifiers for "search URLs" we're matching against). One big factor here is that our system is primarily designed for _creating_ search URLs, and only secondarily designed for parsing terms out of existing URLs. Almost all search engines have some simpler form of search URL we can create, meaning this is largely just a problem for cases where we want to parse pre-existing URLs. The benefits of being able to do that latter are pretty low, especially for non-general-purpose search engines. So it's not obvious to me what sort of compelling application this functionality would enable.
,
May 16 2016
Thanks for the detailed answer. |
|||
►
Sign in to add a comment |
|||
Comment 1 by vit...@yandex-team.ru
, May 12 2016