Omnibox autocomplete incorrectly handles words with combining diacritics
Reported by
dani...@gmail.com,
Apr 15 2016
|
|||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36 Steps to reproduce the problem: 1. Search for an hebrew word that starts with diacritics (Nikud). for example עֲבוֹדָה 2. Now start a new search by just typing the first letter and let it auto complete, in the above example type the hebrew letter ע 3. The autocomplete will mark the rest of the letters as expected. e.g. בודה will be marked 4. now if you type another letter, let's say ג, as opposed to English or regular Hebrew behavior, the entire word will be deleted and only ג will remain in the bar. The expected behavior is that עג will be shown in the bar, this probably happens because the diacritic letter is counted as a letter and not as a mark. What is the expected behavior? What went wrong? Autocomplete behavior is inconsistent with diacritics, see steps to reproduce Did this work before? N/A Chrome version: 50.0.2661.75 Channel: stable OS Version: OS X 10.11.4 Flash Version: Shockwave Flash 21.0 r0
,
Apr 16 2016
This wouldn't be Mac-specific. From Unicode's perspective, the diacritics are distinct code points that combine with the prior characters. They also work this way when using the backspace key: one backspace will delete the diacritic, the next the letter. Inline autocompleting in the case where the input string has proceeded past where one of these diacritics would appear, without including it, is problematic, because we have no way to represent an autocompletion that contains the diacritic (we can only add on to the end of the input) and we don't know that the input without the diacritic has the same meaning. (I don't speak Hebrew, but AFAICT from searching, the two strings aren't the same, or at least aren't treated the same by Google.) So I think what the omnibox is doing is correct: it's not autocompleting you to a different string that you never typed, because that different string does not necessarily mean the same thing to e.g. a search engine. Reporter, if I'm mistaken, then I'll need a fuller explanation of how Hebrew diacritics affect semantics.
,
Apr 16 2016
Well, You are right that diacritics can combine to result different meanings. But modern Hebrew usually doesn't use them. The main uses are: 1. Poetry 2. Bible 3. Content aimed at children who just learned to read They are supposed to represent vowels, but you can write the same word without them and you'll get the same meaning and pronunciation, they are implied when read. More so, Chrome actually supports this definition, if you use ctrl-f to search a page for a non-diacritical version of an Hebrew word that is contained in the page as diacritical, Chrome would find it. Although you are right, there can be two distinct Hebrew words with the same spelling and different diacritical marks, which also means a different meaning. but because of Hebrew use of diacritics, a different mark on a letter does not make it a different letter. That's why the behavior is inconsistent, in the eyes of a user, it is the same letter no matter how it's marked. Now, I think I haven't really correctly explained the problem, The thing is, that once you've once searched for a Hebrew word with a diacritical sign on the first letter, your search history is contaminated, and you would have to write the first letter twice to write a different search term. and this also prevents you from using any autocomplete on the word, as any autocomplete suggestion is deleted. I'll give you an example in the Latin alphabet, which will probably be easier to understand compared to a language you don't speak. I'm using "y" and " ̆" (U+306) to construct the word "y̆our" Steps to recreate: 1. Clear your search history 2. Use omnibox to search for y̆our 3. Start a new search by typing y, notice that the entire word is selected by autocomplete and not only the diacritic and "our" as expected 4. Start typing "ep", as you would if you wanted to search for the word "yep". When e is pressed, the entire word is deleted which will leave you with the string "ep" in the omnibox, and not "yep" as expected
,
Apr 16 2016
Thank you for providing more feedback. Adding requester "pkasting@chromium.org" for another review and adding "Needs-Review" label for tracking. For more details visit https://sites.google.com/a/chromium.org/dev/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 18 2016
@3: I can't reproduce the behavior you describe, at least on trunk. Using your steps, while the selection in comment 3 does look as if it covers the whole word (probably because we can't always sanely draw a selection for "just the diacritic on a letter", but CCing msw in case he wants to comment here), typing in step 4 acts as it should -- "ep" results in "yep", not "ep". For some reason I can't get the Hebrew input in comment 0 to autocomplete at all, despite debug info telling me it ought to, so I can't test that case :/ I wonder if this really is Mac-specific after all. +CC shrike -- can you test the steps in comment 3 on Mac? If they are broken, then I think this is Mac-only (and maybe msw can once again comment, this time on whether we use all our own Textfield machinery there as we do elsewhere).
,
Apr 18 2016
I've tried this on my work computer, Windows 10, and it didn't recreate, so it probably is Mac-only Btw, when trying to autocomplete the Hebrew input, keep in mind it's a right-to-left language, so if you'd try copy pasting from the left, it won't work. also the non diacritic version of the first letter is "ע" Anyway, thanks for the quick response for this rather obscure issue :)
,
Apr 18 2016
Yeah, I did the correct repro steps for the Hebrew autocompletion. I don't know why it's not happening. Anyway, I bet this is basically "Mac omnibox does not use views::Textfield" and whatever system it does use is handling this incorrectly. ->Jayson to triage, I don't know who formally owns the Mac omnibox now.
,
Apr 18 2016
Hello danilan@, Would you please provide more detailed steps on how to reproduce the problem? I am not super familiar with Hebrew and so I'm not sure I'm typing the right characters. Maybe let me know the exact keys to press on my US keyboard. It may be easier to make a movie of the whole thing with the onscreen keyboard visible - that way I can see exactly what to tap on that keyboard, and see the results you're getting in the browser.
,
Apr 18 2016
,
Apr 18 2016
@8: Some info on inputting Unicode in Mac OS: https://en.wikipedia.org/wiki/Unicode_input#In_Mac_OS You can search for particular Unicode characters at http://www.fileformat.info/ to obtain their code values and such.
,
Apr 18 2016
I know about entering Unicode, and how to invoke the Hebrew input manager to enter Hebrew. I just would like exact instructions from the reporter so that I'm sure I'm entering the exact characters he is to reproduce the problem.
,
Apr 18 2016
OK. I used the ones he gave in comment 0 :) (Incidentally, I think the reason I wasn't getting autocompletion on Win is because I was pasting rather than keying in the characters, which disables inline autocompletion. Unfortunately the Windows omnibox doesn't support a couple of the Windows standard methods for entering Unicode; I've filed a separate bug about that.)
,
Apr 19 2016
@11: I won't have time right now to map the string in comment 1 to a latin keyboard, (it's morning :) but I did recreate it on a Latin alphabet in the end of comment 3. Can you try that first?
,
Apr 19 2016
Thank you for the additional info. Here are precise steps to reproduce: 1. Launch Chrome with an empty user-data-dir 2. Create a new tab 3. Paste y̆our into the Omnibox and press return 4. Click in the Omnibox and type y At this point the Omnibox contains y̆our but the entire word is selected. If you perform the same steps with "your", only the last three characters are selected at this point. On the Mac the text system is treating the two-character sequence as a single glyph (which I think is the right way to do it), which I suspect is throwing off where the Omnibox thinks the selection should fall. I haven't looked at any code but I think it'll be difficult to correct for this behavior on the Mac side.
,
Apr 19 2016
@14: The critical bit is not how the selection appears, but what happens when you type the letter "o" after the letter "y". If this deletes the letter "y", then this basically means typing anything with a diacritic can break typing of that letter in the future. Even if that's hard to solve, that should be a P1 bug and it needs to have an owner; we can't live with that. Can you please find an appropriate owner? If, OTOH, you get "yo" (with or without an autocompletion), then it doesn't seem like we can repro the primary bug reported here.
,
Apr 19 2016
I understand what the critical issue is, thank you, and my point about selection is that the Omnibox code appears to be directing the Mac text system to select the wrong amount of text, which is why the string gets deleted when you type the second character. It is incorrect to say that this is a problem typing anything with a diacritic - for example, I cannot reproduce the problem when searching for a string like über.
,
Apr 19 2016
That's a good distinction to make. This only applies to combining diacritics; ü is a single Unicode codepoint. I'm still concerned, though, since it seems like some languages may have a lot of combining diacritics or other combining/joining characters. I'm not familiar enough to know for sure, but I'd worry about Vietnamese and maybe Arabic? Perhaps the solution here is to switch the Mac omnibox to use the views::Textfield, even outside the normal path of other Mac views work? I don't know if that's possible and I don't know how complete the Mac port of Textfield is, but in theory maybe that would leave us in the same place as other platforms. Or maybe we're using the Mac text editing APIs incorrectly and there's a fix to how we say to do selection, where we can select in between the diacritic codepoint and the previous (combined) character codepoint, but we're incorrectly not doing so? Maybe there's some cross-platform omnibox code that needs to be per-codepoint that isn't? Is there anyone at all available to look into all this more deeply on the mac side? Maybe we'll end up punting this bug but ideally we could answer the above first to ensure we completely understand the scope of the problems + fixes.
,
Apr 19 2016
I will try to find a little time to look at this. I was thinking we might have to resort to some kind of intermediate string massaging - perhaps a views::Textfield is the answer.
,
Oct 1 2016
,
Oct 1 2016
Issue 158070 has been merged into this issue.
,
Oct 1 2016
Note from duped-in issue: this affects Thai as well.
,
Oct 17 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3 commit 8a8bae9fbc2e72afc208a0376c0f4a49388a6da3 Author: lgrey <lgrey@chromium.org> Date: Mon Oct 17 13:56:41 2016 Preserve original selection when suggesting completions with diacritics When NSTextView is asked to select a range which does not begin on a grapheme boundary, it expands the selection to the previous boundary. This change preserves the original selection, then contracts the range sent to NSTextView to fall on the *next* boundary. Text editing operations that operate on the selection use the original selection instead of the visual selection. Since the omnibox view uses the text view's selected range, the view's |selectedRange| returns the original range and not the visual range. BUG=603883 Review-Url: https://codereview.chromium.org/2395233005 Cr-Commit-Position: refs/heads/master@{#425671} [modify] https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor.h [modify] https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor.mm [modify] https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor_unittest.mm [modify] https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3/chrome/browser/ui/cocoa/omnibox/omnibox_view_mac.mm [modify] https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3/chrome/browser/ui/cocoa/omnibox/omnibox_view_mac_browsertest.mm
,
Oct 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/67434dc6fcbb9c1f734b147e75183d864b33f174 commit 67434dc6fcbb9c1f734b147e75183d864b33f174 Author: lgrey <lgrey@chromium.org> Date: Tue Oct 18 15:55:59 2016 Revert of [Mac] Preserve original selection when suggesting completions with diacritics (patchset #13 id:240001 of https://codereview.chromium.org/2395233005/ ) Reason for revert: This almost definitely caused crbug.com/656972, but I can't repro the issue. Reverting while I investigate. Original issue's description: > Preserve original selection when suggesting completions with diacritics > > When NSTextView is asked to select a range which does not begin on a grapheme > boundary, it expands the selection to the previous boundary. This change > preserves the original selection, then contracts the range sent to > NSTextView to fall on the *next* boundary. > > Text editing operations that operate on the selection use the original > selection instead of the visual selection. > > Since the omnibox view uses the text view's selected range, the > view's |selectedRange| returns the original range and not the visual > range. > BUG=603883 > > Committed: https://crrev.com/8a8bae9fbc2e72afc208a0376c0f4a49388a6da3 > Cr-Commit-Position: refs/heads/master@{#425671} TBR=asvitkine@chromium.org,erikchen@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=603883 Review-Url: https://codereview.chromium.org/2426983002 Cr-Commit-Position: refs/heads/master@{#425975} [modify] https://crrev.com/67434dc6fcbb9c1f734b147e75183d864b33f174/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor.h [modify] https://crrev.com/67434dc6fcbb9c1f734b147e75183d864b33f174/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor.mm [modify] https://crrev.com/67434dc6fcbb9c1f734b147e75183d864b33f174/chrome/browser/ui/cocoa/location_bar/autocomplete_text_field_editor_unittest.mm [modify] https://crrev.com/67434dc6fcbb9c1f734b147e75183d864b33f174/chrome/browser/ui/cocoa/omnibox/omnibox_view_mac.mm [modify] https://crrev.com/67434dc6fcbb9c1f734b147e75183d864b33f174/chrome/browser/ui/cocoa/omnibox/omnibox_view_mac_browsertest.mm
,
Oct 18 2016
Looking at the Summary tab in the crash reporter, this is the exception that was thrown:
Crashing on exception: *** -[NSBigMutableString substringWithRange:]: Range {0, 32000} out of bounds; string length 88
So something about the current selection looks to be screwed up. This might be another spot where the current selection gets set (separate from setSelectedRange:).
,
Oct 18 2016
From a few things ellyjones@ discovered, it looks like 32000 is coming from the max size of a text span in the TextEdit framework (https://github.com/steventroughtonsmith/MPWTestSuite/blob/master/MacC/TESample.h#L179 for example.) That makes us think this isn't so much the selection being updated without us seeing it, but some error condition in the Carbon code that's giving up and setting the "max" selection size as a fallback. There's also this: https://crash.corp.google.com/browse?stbtiq=57efc43b00000000 If it's the same issue (and I'm moderately confident it is), then the right-click menu is a red herring. Would love to figure out how to repro.
,
May 23 2017
lgrey@, Do we have any latest update on this? Also we can mark it as 'Fixed' if no other CL is pending. Thank you!
,
May 24 2017
Hi manoranjanr@ I haven't had a chance to look at this again yet. I definitely wouldn't mark it fixed, since the CL was rolled back. I think a fix will be quite involved and/or may require action from Apple.
,
Jun 15 2017
lgrey@: should this be downgraded to P-3? The issue sounds serious (affects core omnibox functionality in RTL languages it seems), which makes me think P-2, yet it doesn't sound like we can / plan to fix it in the short term, hence P-3?
,
Jun 15 2017
shrike@, what's your take?
,
Jun 16 2017
The change in c#22 was made to accommodate the omnibox code, basically maintaining the selection range that the omnibox expects while adjusting that range to conform to how NSTextView works. This was kind of an ugly hack because we had to override a private Appkit method; it ultimately also did not work. What if, instead, we change the omnibox to send Chrome Mac the kind of selection range that NSTextView expects? That way both the omnibox and NSTextView have the same selection range (so there's no trickery needed to keep the two in sync). Short of trying this approach, I'm not sure we will be able to fix this bug.
,
Jun 16 2017
I think we'd just be trading one sharp corner for another
(using ü to illustrate the point, I know as per above it's a single code point)
Let's say typing u autocompletes to über for me. Since the omnibox knows NSTextView can't select inside the boundary, it sends the selection range as {1, 3}.
ü[ber]
If I press delete at this point, I get:
ü
If I press delete again, I get the empty string. Basically impossible to type just "u". IIRC this is similar to one of the current failure modes.
,
Oct 11 2017
lgrey@ and shrike@, (scan comments #14 and onward) Is there any way out of this wilderness? This is a problem for Hebrew, Thai, and several other languages with combining characters. Should we try to re-land the hack that overrides a private AppKit method? Should we leave this as-is until someday we get regular TextFields on Mac? Is there another approach? Or, as a stopgap, do we have to decide between the two bad behaviors: - when inline autocompletion happens, you cannot type beyond the first combining character in the completion (e.g., to type a different query). - when inline autocompletion happens, you cannot type a non-combined character if it's part of the inline autocompletion. (This is lgray@'s comment #32.) Ugh.
,
Oct 11 2017
I wouldn't mind trying to reland the hack. This was my first Chromium CL, so I think I'm better equipped now to debug the crashes this time assuming they still occur.
,
Oct 18 2017
,
Feb 21 2018
For the record, this appears to be a problem on Views too. See bug 702716.
,
Feb 21 2018
The actual corresponding Views bug is Issue 813534
,
Aug 28
Is there any solution for this yet? or a way to completely disable autocomplete in omnibar? This issue becomes more frustrated already.
,
Aug 28
Issue 813534 would be the one to follow now, since the Cocoa Omnibox is 99.99% likely to be retired starting from M69. The good news is, it's a much more tractable problem in Views since we're in control of the whole stack there.
,
Aug 28
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by ccameron@chromium.org
, Apr 15 2016Labels: Needs-TestConfirmation