With the new fancy CharacterRange computation, surrogate pairs seems to get half the advance compared to previously:
https://jsfiddle.net/0o4hbqsh/
(We "skip" the trail, so it'll never be counted.)
Lets see if a bisect will show which patch caused this (likely one of my patches).
@Testers, this testcase may be a little easier to use: http://jsbin.com/jamire
Some movement in the direction of issue 593570 would likely fix this (i.e iterate glyphs and create SVGTextMetrics from them rather than doing it per character [or "per code point" even as this issues shows]. Will need to "synthesize" per-character/grapheme metrics though for the query interfaces - eventhough I don't think the spec strictly requires that granularity.)
@fs, wdyt about removing the synthesis of positions for ligatures in SVGTextMetricsBuilder? This would result in 'ffi' having character widths "20px", "0px", and "0px" which would look worse in some scenarios, but better for cases like this (20px is a made up font size here). Another option is to switch to the approach Gecko uses where 'fii' has character widths "20px", "20px", "20px" and the ranges overlap.
Would you be up for adding your thoughts to https://github.com/w3c/svgwg/issues/65?
Tav kind of beat me to it to some degree...
Removing the synthesis (in SVGTextMetricsBuilder) sgtm. I'm not sure if keeping zero-width entities is the the right thing to do though. We could keep the TCU and then synthesize based on graphemes[1] when needed? So store only "20px" (and maybe some metadata, we could make room for a few bits there) and then split based on grapheme count. (I think there's supposed to be data for better "split points" in fonts, but no idea how common that is.)
[1] https://svgwg.org/svg2-draft/text.html#TermFindGraphemeClusterForCharacter
I walked in the office today and found Dominik at my desk! Dominik mentioned that the font data for cursor positions is available but nothing currently uses it so it may be risky to depend on.
@drott, you mentioned that we have an iterator that can disambiguate character + combining diacritic from ligatures but I wasn't able to find it in the codebase. Can you point me to that?
> I walked in the office today and found Dominik at my desk! Dominik mentioned that the font data for cursor positions is available but nothing currently uses it so it may be risky to depend on.
Yes, we should experiment with caret positioning information from the fonts, but the next best thing atm is linear interpolation.
> @drott, you mentioned that we have an iterator that can disambiguate character + combining diacritic from ligatures but I wasn't able to find it in the codebase. Can you point me to that?
Well, it's purpose is to find grapheme cluster boundaries, a concept that can be determined without the font & style information, here is: cursorMovementIterator
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/text/TextBreakIterator.h&q=cursorMovem&sq=package:chromium&l=41
See also inline countGraphemesInCluster() in ShapeResultBuffer where this is used similarly to count the number of Graphemes in a HarfBuzz cluster, in order to determine where to place emphasis marks, which is a similar thing.
Comment 1 by f...@opera.com
, Mar 23 2016