New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 675477 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Mac
Pri: 2
Type: Bug



Sign in to add a comment

Tamil language content editing does not work well.

Reported by shankark...@gmail.com, Dec 18 2016

Issue description

If I type in a textbox, especially the ones that support richtext editing, the characters are jumbled.

Chrome Version       : Google Chrome Version 52.0.2743.116 (64-bit), Linux Mint v18.1
URLs (if applicable) : https://www.facebook.com/
Other browsers tested:
Firefox 49.0.2 - OK
Safari - OK
IE11 - OK
Chromium: Fail

What steps will reproduce the problem?
(1) Interpretation of pulli (Tamil: புள்ளி, puḷḷi, character ் mentioned at http://graphemica.com/0BCD) Tamil virama sign.
What steps will reproduce the problem?
(2) Open Facebook (as an example)
(3) Type அச்சம் using a Tamil keyboard driver.

What is the expected result?
i) The word அச்சம் should appear


What happens instead?
i) Shows up as அச்ம்சம்.

Please provide any additional information below. Attach a screenshot if
possible.
Another example text: Typing தன்னிலை shows up as தன்ல்னிலை.

Did this work before? Yes, but unsure of the version. Must at least have been 4 months ago.

Pattern identification: When a full or half consonant letter is typed after a visarga and full consonant, the last typed letter is getting inserted behind the visarga.

 

Comment 1 by tkent@chromium.org, Dec 18 2016

Components: Blink>Editing
Labels: bisc

Comment 2 by tkent@chromium.org, Dec 18 2016

Labels: -bisc Needs-Bisect

Comment 3 by ajha@chromium.org, Dec 18 2016

Labels: -Type-Bug Needs-Milestone OS-Linux Type-Bug-Regression
Adding Needs-Milestone label for further triaging by respective milestone team.
The issue happens in Windows as well, verified in a few recent Windows versions. I feel both 675478 and 675477 have a common root cause.

The steps to reproduce section requires editing as I added it incorrectly. This is the correct version:

Summary of the bug: Interpretation of pulli (Tamil: புள்ளி, puḷḷi, character ் mentioned at http://graphemica.com/0BCD) Tamil virama sign.

What steps will reproduce the problem?
(1) Open Facebook (as an example)
(2) Type அச்சம் using a Tamil keyboard driver.
Verified in Google Chrome in Mac OS X as well and the issue could be reproduced. Screencast is at https://l.facebook.com/l.php?u=https%3A%2F%2F1drv.ms%2Fv%2Fs!AuK0JWpZhGTUhMhKR5eTNJIIsREAVw&h=1AQHlQg5s .

Comment 6 by ajha@chromium.org, Dec 19 2016

Cc: yosin@chromium.org ajha@chromium.org nona@chromium.org
Labels: -Type-Bug-Regression M-57 Type-Bug
Status: Untriaged (was: Unconfirmed)
Based on the bisect result from  Issue 675478 , cc'ing nona@ for confirmation if both are same.

Comment 7 by ajha@chromium.org, Dec 20 2016

Labels: Needs-Feedback OS-Mac
shankarkrupa@: Unlike  Issue 675478 , I was unable to reproduce the issue on the reported version on Mac OS 10.11.6. Attaching the screen-cast for reference. Typed (accam for அச்சம்) using keyboard and didn't observe this changing to அச்ம்சம்.




675477.mp4
1.0 MB View Download
I am unable to get access to the person with the Mac, so unable to verify it in Mac. I do use Linux, and it is a big problem for myself and many users over the day-to-day activities.
Cc: jmukthavaram@chromium.org
Unable to reproduce the issue on Linux ubuntu 14.04 with chrome stable version#55.0.2883.87.

Observed text (அச்சம்) displayed correctly using Tamil keyboard driver.

Please find the attached screencast & let us know if we miss anything to reproduce the issue.

Thank you.


675477.mp4
1.0 MB View Download
I believe the mentioned keyboard driver is Google Input tools. I am not able to reproduce the issue with this driver, either. However, with a system-level input driver like IBus for Linux, the issue still persists. I have not seen the source code, but could this be because the Google Input tool handles the input differently than system-level input drivers? For example, does Google Input tool sequentially replace the characters instead of sending the converted key code as it is typed?

 Issue 675478  could still be observed even when typing with Google Input: When I move the cursor with the left arrow twice, the caret position moves to ச் instead of ச. I therefore believe this issue persists with system-level keyboard drivers in different operating systems.

Comment 11 by ajha@chromium.org, Jan 11 2017

shankarkrupa@: Could you please confirm which IBus system level driver you are using on Linux.
I use iBus 1.5.11 in Linux Mint. This is available in Ubuntu latest version. The input method I use is Tamil Phonetic (m17n).
Status: Unconfirmed (was: Untriaged)
Anyway to input Tamil with an alphabetical keyboard?
It is very good if you tell alphabetical key sequence.
(For example, type 'foobar' to input அச்சம்)
Sure. With phonetic, type the letters a-s-s-a-m for அச்சம். For typing தன்னிலை, press t-h-a-n-n-i-l-a-i.
...without the hyphens, of course. So it would be:
'assam' for அச்சம்
'thannilai' for தன்னிலை
Labels: -Needs-Milestone
I only can input
ோேேோஸ as 'assam' on windows.
BTW, which textarea did you try to input on facebook?
It appears you might be using a different keyboard than phonetic.

I tried it in the commentbox. The issue could be easily reproduced in gmail compose area as well.
Labels: TE-NeedsTriageFromMTV
Unable to reproduce the issue on Ubuntu 14.04,Mac 10.12.2 and Win 10 using 55.0.2883.87/95(by changing Keyboard input to Tamil).
On Mac and Linux was getting அச்சம் when typed assam and on Win geting ோேேோஸ when typed assam.

Requesting MTV team if they have above machine setup as per above comment #18.
@Durga (Comment 19): Could you post a screencast when it appeared right? What is the keyboard driver and not Google Input you are using in Linux and on Windows? I presume they are different since iBus is not used in Windows.
Project Member

Comment 21 by sheriffbot@chromium.org, Feb 3 2017

Labels: -Needs-Feedback Needs-Review
Owner: ajha@chromium.org
Thank you for providing more feedback. Adding requester "ajha@chromium.org" for another review and adding "Needs-Review" label for tracking.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Available (was: Unconfirmed)

Comment 23 by ajha@chromium.org, Feb 14 2017

Owner: ----
Labels: -Needs-Review
re@20 : Looks like I have updated wrong for Ubuntu 14.04 in above comment.
Using the system provided Tamil language input on Ubuntu 14.04 on the latest stable 56.0.2924.87 and 55.0.2883.87 when typed assam getting ோேேோஸ .
Please refer the screen cast for the same.

675477_Feb_14.ogv
2.9 MB View Download
@durga (Comment 24): This looks like a different keyboard layout and the keys are different. Instead of "Tamil", please choose "Tamil phonetic (m17n)" for the layout of the input method type.
Labels: -TE-NeedsTriageFromMTV -M-57 Needs-Triage-M62 M-62
shankarkrupa@ are you still seeing the issue?

Team, please try a repro.
Labels: -Needs-Bisect
Unable to reprodcue the issue using #60.0.3112.78 on Mac 10.12.5 as per the steps mentioned in comment #0. Did not observe on typing அச்சம் is changing to அச்ம்சம்.

Observing the same behavior since M45. Please find the attched screen cast for the reference.

Removing Bisect label as this issue doesn't reproduce consistently from our end. Please add it back if required.

Can some one from Blink>Editing team please look into this issue.

Thanks!!
Aug 1 2017 2-24 PM.webm
5.5 MB View Download
@Sandeep, it does work okay as always on the address bar. The issue occurso only on text inputs specificall you on textarea-like inputs. Eg: Facebook comments and posthe areas.

@re-comment26: yes, the issue is still there.

Comment 29 by kojii@chromium.org, Dec 22 2017

Looks like WebKit has specific rule "that prevent a caret from moving after virama signs of Indic languages except Tamil (Bug 15790)", is this related?
Cc: rlanday@chromium.org kojii@chromium.org
Issue 792713 has been merged into this issue.
Labels: -Pri-3 OS-Android Pri-2
Owner: rlanday@chromium.org
Status: Started (was: Available)
We have the same code in Chromium as well (code search is acting up right now so I can't find a link). It seems that the WebKit bug in question:
https://bugs.webkit.org/show_bug.cgi?id=15790

had to do with moving the cursor properly on pressing left/right arrow keys. I haven't looked into the details of exactly what's going on with that fix.

The root cause of the IME bug though is grapheme cluster normalization. Independent of what we want the behavior to be when a user tries to select text, or navigate between characters with the arrow keys, it appears that we must allow IMEs to set a composition range that starts and/or ends in the middle of a grapheme cluster boundary. Otherwise we're going to run into issues like this and the Android handwriting bug (crbug.com/792713).

One path forward is to update InsertTextCommand (and probably also DeleteTextCommand, since it's called by InsertTextCommand in some cases) to be able to operate on non-normalized positions. It remains to be seen how much other stuff this would break and/or require to be changed. If we do this, we may not want to change the behavior of the JavaScript APIs (execCommand('insertText') and execCommand('delete')), at least not before we know what the consequences are.

The other path forward would be to extend VisiblePosition so that we can choose when we want to normalize to grapheme cluster boundaries and when we don't. E.g. we could maybe say that we only want to snap the position to a GCB when setting a selection from user input.

xiaochengh@ says it seems better to try to reduce the usage of VisiblePosition normalization rather than to increase it, which makes the first option seem better. I'll try to write up a doc today and some proof of concept CLs to show what different approaches might look like.
Note: I was able to reproduce this issue using the "Tamil - phoentic (m17n)" IME as described. The behavior in the current version of Chrome seems different from what the reporter described, but I get the correct behavior after applying this in-progress CL:
https://chromium-review.googlesource.com/c/chromium/src/+/823613
Or, since we've gotten feedback before (693687) that our behavior for VisiblePosition canonicalization is apparently just wrong in general for Tamil, maybe the correct fix is to just globally change our implementation of the grapheme cluster boundary algorithm?

https://chromium.googlesource.com/chromium/src/+/7d8d866c5fbd4b9c5fe0e0ce39a215d8a731dff4/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtil.cpp#46
Labels: -M-62 -Needs-Triage-M62
Ok, I think there are really two separate issues here.

The issue with the phonetic IME here is coming from special handling for virama characters we added to our grapheme cluster boundary algorithm. According to  crbug.com/693687 , we need to treat the Tamil virama differently from other langauges' viramas. I have a CL up to do this which fixes that bug and the Linux IME issue reported here:
https://chromium-review.googlesource.com/c/chromium/src/+/843461

The issue with the Android handwriting IME comes up because we're hitting rule GB9a ("Do not break before SpacingMarks") in the Unicode grapheme cluster boundary algorithm:
http://unicode.org/reports/tr29/#GB9a

The handwriting IME seems to be invoking known pathological behavior by sending these spacing mark characters (e.g. U+0BC7: ே) without a preceding no-break space for them to attach to. See:

http://www.unicode.org/versions/Unicode10.0.0/ch07.pdf
(Section 7.9 Combining Marks, subheading "Marks as Spacing Characters")

I left a comment on the Google-internal bug (b/70016473) to see if we can get the handwriting IME updated to stop doing this. If we *really* want to match the native Android EditText widget behavior, we'll probably have to modify our editing code so we support opening a composition that doesn't start at a grapheme cluster boundary, which I'm not sure really makes sense.

I'll unmerge the handwriting IME bug (crbug.com/792713) and follow up with some more specific comments there.

Labels: M-65
Status: Fixed (was: Started)
Will be fixed in Chrome 65:
https://chromium-review.googlesource.com/c/chromium/src/+/843461
Project Member

Comment 37 by bugdroid1@chromium.org, Jan 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/60a12c281f9c177383b3bcad8c6459dac6e07f25

commit 60a12c281f9c177383b3bcad8c6459dac6e07f25
Author: Ryan Landay <rlanday@chromium.org>
Date: Wed Jan 03 03:37:08 2018

Allow splitting grapheme clusters after Tamil virama

We currently don't allow grapheme clusters to be split after Indic virama
characters (this seems to be a custom deviation from the grapheme cluster rules
in UAX #29). According to a Googler familiar with Indic languages, this behavior
is correct for other Indic languages, but not for Tamil ("Tamil is an exception
because it doesn't compound glyphs to form a single glyph when combining a pure
consonant with a vowel+consonant combo.")

This is causing at least two problems:

1. It's not possible to put the insertion point between certain pairs of Tamil
   characters when typing ( crbug.com/693687 ).

2. Inputting Tamil with phonetic keyboard IMEs is super broken
   ( crbug.com/675477 ).

This CL fixes both of these problems by treating the Tamil virama character
differently from other Indic viramas.

Note: this does *not* address the issue in crbug.com/792713 where the Tamil
handwriting IME on Android inserting lone SpacingMark characters triggers odd
behavior.

Bug:  675477 ,  693687 
Change-Id: Iae95e70418aeadcbab5296245ad7253cf3c31cde
Reviewed-on: https://chromium-review.googlesource.com/843461
Reviewed-by: Emil A Eklund <eae@chromium.org>
Reviewed-by: Xiaocheng Hu <xiaochengh@chromium.org>
Commit-Queue: Ryan Landay <rlanday@chromium.org>
Cr-Commit-Position: refs/heads/master@{#526611}
[modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/ime/InputMethodControllerTest.cpp
[modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtil.cpp
[modify] https://crrev.com/60a12c281f9c177383b3bcad8c6459dac6e07f25/third_party/WebKit/Source/core/editing/state_machines/StateMachineUtilTest.cpp

Thank you Ryan for taking and fixing this!!

Sign in to add a comment