Tamil language content editing does not work well.
Reported by
shankark...@gmail.com,
Dec 18 2016
|
||||||||||||||||||
Issue descriptionIf I type in a textbox, especially the ones that support richtext editing, the characters are jumbled. Chrome Version : Google Chrome Version 52.0.2743.116 (64-bit), Linux Mint v18.1 URLs (if applicable) : https://www.facebook.com/ Other browsers tested: Firefox 49.0.2 - OK Safari - OK IE11 - OK Chromium: Fail What steps will reproduce the problem? (1) Interpretation of pulli (Tamil: புள்ளி, puḷḷi, character ் mentioned at http://graphemica.com/0BCD) Tamil virama sign. What steps will reproduce the problem? (2) Open Facebook (as an example) (3) Type அச்சம் using a Tamil keyboard driver. What is the expected result? i) The word அச்சம் should appear What happens instead? i) Shows up as அச்ம்சம். Please provide any additional information below. Attach a screenshot if possible. Another example text: Typing தன்னிலை shows up as தன்ல்னிலை. Did this work before? Yes, but unsure of the version. Must at least have been 4 months ago. Pattern identification: When a full or half consonant letter is typed after a visarga and full consonant, the last typed letter is getting inserted behind the visarga.
,
Dec 18 2016
,
Dec 18 2016
Adding Needs-Milestone label for further triaging by respective milestone team.
,
Dec 19 2016
The issue happens in Windows as well, verified in a few recent Windows versions. I feel both 675478 and 675477 have a common root cause. The steps to reproduce section requires editing as I added it incorrectly. This is the correct version: Summary of the bug: Interpretation of pulli (Tamil: புள்ளி, puḷḷi, character ் mentioned at http://graphemica.com/0BCD) Tamil virama sign. What steps will reproduce the problem? (1) Open Facebook (as an example) (2) Type அச்சம் using a Tamil keyboard driver.
,
Dec 19 2016
Verified in Google Chrome in Mac OS X as well and the issue could be reproduced. Screencast is at https://l.facebook.com/l.php?u=https%3A%2F%2F1drv.ms%2Fv%2Fs!AuK0JWpZhGTUhMhKR5eTNJIIsREAVw&h=1AQHlQg5s .
,
Dec 19 2016
Based on the bisect result from Issue 675478 , cc'ing nona@ for confirmation if both are same.
,
Dec 20 2016
shankarkrupa@: Unlike Issue 675478 , I was unable to reproduce the issue on the reported version on Mac OS 10.11.6. Attaching the screen-cast for reference. Typed (accam for அச்சம்) using keyboard and didn't observe this changing to அச்ம்சம்.
,
Dec 22 2016
I am unable to get access to the person with the Mac, so unable to verify it in Mac. I do use Linux, and it is a big problem for myself and many users over the day-to-day activities.
,
Jan 9 2017
Unable to reproduce the issue on Linux ubuntu 14.04 with chrome stable version#55.0.2883.87. Observed text (அச்சம்) displayed correctly using Tamil keyboard driver. Please find the attached screencast & let us know if we miss anything to reproduce the issue. Thank you.
,
Jan 9 2017
I believe the mentioned keyboard driver is Google Input tools. I am not able to reproduce the issue with this driver, either. However, with a system-level input driver like IBus for Linux, the issue still persists. I have not seen the source code, but could this be because the Google Input tool handles the input differently than system-level input drivers? For example, does Google Input tool sequentially replace the characters instead of sending the converted key code as it is typed? Issue 675478 could still be observed even when typing with Google Input: When I move the cursor with the left arrow twice, the caret position moves to ச் instead of ச. I therefore believe this issue persists with system-level keyboard drivers in different operating systems.
,
Jan 11 2017
shankarkrupa@: Could you please confirm which IBus system level driver you are using on Linux.
,
Jan 12 2017
I use iBus 1.5.11 in Linux Mint. This is available in Ubuntu latest version. The input method I use is Tamil Phonetic (m17n).
,
Jan 16 2017
Anyway to input Tamil with an alphabetical keyboard? It is very good if you tell alphabetical key sequence. (For example, type 'foobar' to input அச்சம்)
,
Jan 17 2017
Sure. With phonetic, type the letters a-s-s-a-m for அச்சம். For typing தன்னிலை, press t-h-a-n-n-i-l-a-i.
,
Jan 17 2017
...without the hyphens, of course. So it would be: 'assam' for அச்சம் 'thannilai' for தன்னிலை
,
Jan 19 2017
,
Jan 20 2017
I only can input ோேேோஸ as 'assam' on windows. BTW, which textarea did you try to input on facebook?
,
Jan 25 2017
It appears you might be using a different keyboard than phonetic. I tried it in the commentbox. The issue could be easily reproduced in gmail compose area as well.
,
Jan 27 2017
Unable to reproduce the issue on Ubuntu 14.04,Mac 10.12.2 and Win 10 using 55.0.2883.87/95(by changing Keyboard input to Tamil). On Mac and Linux was getting அச்சம் when typed assam and on Win geting ோேேோஸ when typed assam. Requesting MTV team if they have above machine setup as per above comment #18.
,
Jan 27 2017
@Durga (Comment 19): Could you post a screencast when it appeared right? What is the keyboard driver and not Google Input you are using in Linux and on Windows? I presume they are different since iBus is not used in Windows.
,
Feb 3 2017
Thank you for providing more feedback. Adding requester "ajha@chromium.org" for another review and adding "Needs-Review" label for tracking. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Feb 7 2017
,
Feb 14 2017
,
Feb 14 2017
re@20 : Looks like I have updated wrong for Ubuntu 14.04 in above comment. Using the system provided Tamil language input on Ubuntu 14.04 on the latest stable 56.0.2924.87 and 55.0.2883.87 when typed assam getting ோேேோஸ . Please refer the screen cast for the same.
,
Feb 16 2017
@durga (Comment 24): This looks like a different keyboard layout and the keys are different. Instead of "Tamil", please choose "Tamil phonetic (m17n)" for the layout of the input method type.
,
Jul 31 2017
shankarkrupa@ are you still seeing the issue? Team, please try a repro.
,
Aug 1 2017
Unable to reprodcue the issue using #60.0.3112.78 on Mac 10.12.5 as per the steps mentioned in comment #0. Did not observe on typing அச்சம் is changing to அச்ம்சம். Observing the same behavior since M45. Please find the attched screen cast for the reference. Removing Bisect label as this issue doesn't reproduce consistently from our end. Please add it back if required. Can some one from Blink>Editing team please look into this issue. Thanks!!
,
Aug 4 2017
@Sandeep, it does work okay as always on the address bar. The issue occurso only on text inputs specificall you on textarea-like inputs. Eg: Facebook comments and posthe areas. @re-comment26: yes, the issue is still there.
,
Dec 22 2017
Looks like WebKit has specific rule "that prevent a caret from moving after virama signs of Indic languages except Tamil (Bug 15790)", is this related?
,
Dec 22 2017
,
Dec 22 2017
,
Dec 22 2017
We have the same code in Chromium as well (code search is acting up right now so I can't find a link). It seems that the WebKit bug in question: https://bugs.webkit.org/show_bug.cgi?id=15790 had to do with moving the cursor properly on pressing left/right arrow keys. I haven't looked into the details of exactly what's going on with that fix. The root cause of the IME bug though is grapheme cluster normalization. Independent of what we want the behavior to be when a user tries to select text, or navigate between characters with the arrow keys, it appears that we must allow IMEs to set a composition range that starts and/or ends in the middle of a grapheme cluster boundary. Otherwise we're going to run into issues like this and the Android handwriting bug (crbug.com/792713). One path forward is to update InsertTextCommand (and probably also DeleteTextCommand, since it's called by InsertTextCommand in some cases) to be able to operate on non-normalized positions. It remains to be seen how much other stuff this would break and/or require to be changed. If we do this, we may not want to change the behavior of the JavaScript APIs (execCommand('insertText') and execCommand('delete')), at least not before we know what the consequences are. The other path forward would be to extend VisiblePosition so that we can choose when we want to normalize to grapheme cluster boundaries and when we don't. E.g. we could maybe say that we only want to snap the position to a GCB when setting a selection from user input. xiaochengh@ says it seems better to try to reduce the usage of VisiblePosition normalization rather than to increase it, which makes the first option seem better. I'll try to write up a doc today and some proof of concept CLs to show what different approaches might look like.
,
Dec 22 2017
Note: I was able to reproduce this issue using the "Tamil - phoentic (m17n)" IME as described. The behavior in the current version of Chrome seems different from what the reporter described, but I get the correct behavior after applying this in-progress CL: https://chromium-review.googlesource.com/c/chromium/src/+/823613
,
Dec 22 2017
Or, since we've gotten feedback before (693687) that our behavior for VisiblePosition canonicalization is apparently just wrong in general for Tamil, maybe the correct fix is to just globally change our implementation of the grapheme cluster boundary algorithm? https://chromium.googlesource.com/chromium/src/+/7d8d866c5fbd4b9c5fe0e0ce39a215d8a731dff4/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtil.cpp#46
,
Dec 22 2017
Ok, I think there are really two separate issues here. The issue with the phonetic IME here is coming from special handling for virama characters we added to our grapheme cluster boundary algorithm. According to crbug.com/693687 , we need to treat the Tamil virama differently from other langauges' viramas. I have a CL up to do this which fixes that bug and the Linux IME issue reported here: https://chromium-review.googlesource.com/c/chromium/src/+/843461 The issue with the Android handwriting IME comes up because we're hitting rule GB9a ("Do not break before SpacingMarks") in the Unicode grapheme cluster boundary algorithm: http://unicode.org/reports/tr29/#GB9a The handwriting IME seems to be invoking known pathological behavior by sending these spacing mark characters (e.g. U+0BC7: ே) without a preceding no-break space for them to attach to. See: http://www.unicode.org/versions/Unicode10.0.0/ch07.pdf (Section 7.9 Combining Marks, subheading "Marks as Spacing Characters") I left a comment on the Google-internal bug (b/70016473) to see if we can get the handwriting IME updated to stop doing this. If we *really* want to match the native Android EditText widget behavior, we'll probably have to modify our editing code so we support opening a composition that doesn't start at a grapheme cluster boundary, which I'm not sure really makes sense. I'll unmerge the handwriting IME bug (crbug.com/792713) and follow up with some more specific comments there.
,
Jan 2 2018
Will be fixed in Chrome 65: https://chromium-review.googlesource.com/c/chromium/src/+/843461
,
Jan 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/60a12c281f9c177383b3bcad8c6459dac6e07f25 commit 60a12c281f9c177383b3bcad8c6459dac6e07f25 Author: Ryan Landay <rlanday@chromium.org> Date: Wed Jan 03 03:37:08 2018 Allow splitting grapheme clusters after Tamil virama We currently don't allow grapheme clusters to be split after Indic virama characters (this seems to be a custom deviation from the grapheme cluster rules in UAX #29). According to a Googler familiar with Indic languages, this behavior is correct for other Indic languages, but not for Tamil ("Tamil is an exception because it doesn't compound glyphs to form a single glyph when combining a pure consonant with a vowel+consonant combo.") This is causing at least two problems: 1. It's not possible to put the insertion point between certain pairs of Tamil characters when typing ( crbug.com/693687 ). 2. Inputting Tamil with phonetic keyboard IMEs is super broken ( crbug.com/675477 ). This CL fixes both of these problems by treating the Tamil virama character differently from other Indic viramas. Note: this does *not* address the issue in crbug.com/792713 where the Tamil handwriting IME on Android inserting lone SpacingMark characters triggers odd behavior. Bug: 675477 , 693687 Change-Id: Iae95e70418aeadcbab5296245ad7253cf3c31cde Reviewed-on: https://chromium-review.googlesource.com/843461 Reviewed-by: Emil A Eklund <eae@chromium.org> Reviewed-by: Xiaocheng Hu <xiaochengh@chromium.org> Commit-Queue: Ryan Landay <rlanday@chromium.org> Cr-Commit-Position: refs/heads/master@{#526611} [modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/ime/InputMethodControllerTest.cpp [modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtil.cpp [modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtilTest.cpp
,
Jan 3 2018
Thank you Ryan for taking and fixing this!! |
||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||
Comment 1 by tkent@chromium.org
, Dec 18 2016Labels: bisc