XML - random parsing problem: Input is not proper UTF-8, indicate encoding!
Reported by
exande...@gmail.com,
May 17 2016
|
||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36 Example URL: http://t4d.cz/scrap/m-hunt.sk.xml Steps to reproduce the problem: I am creating several XML feeds, for example: 1. Go to: http://t4d.cz/scrap/m-hunt.sk.xml 2. You get error: error on line 591 at column 554: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xA1 0x62 0x61 3. Refresh: error on line 6098 at column 896: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xBD 0x6C 0x65 4. Refresh: error on line 3927 at column 533: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xBD 0x6D 0x69 5. Refresh: error on line 6098 at column 896: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xBD 0x6C 0x65 6. Refresh: error on line 591 at column 554: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC3 0xA1 0x62 0x61 7. Save the XML. 8. Open saved XML - no error at all. 9. Sometimes the feed shows no error at all if you refresh it. 10. Go to: http://t4d.cz/scrap/vo.pyra.eu.xml 11. Refresh - shows the error very rarely. You get randomly an error, usually there are a few places where you get the error. What is the expected behavior? XML is OK and so there should be no errors at all. What went wrong? XML shows not proper UTF-8 error on several places randomly or no error at all. Does it occur on multiple sites: Yes Is it a problem with a plugin? No Did this work before? N/A Does this work in other browsers? Yes Chrome version: 50.0.2661.94 Channel: n/a OS Version: Ubuntu 16.04 Flash Version: Shockwave Flash 21.0 r0
,
May 17 2016
I can reproduce this on Windows 51.0.2704.47 beta-m (64-bit) as well. The server sends
Content-Type: text/xml; charset=utf-8
And the XML begins <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
If I use Fiddler to completely buffer the response (instead of streaming it to the client), Chrome shows no error in the parsing and treats the document as valid XML.
Maybe this is a case where the streaming libxml parser reads an incomplete UTF-8 sequence and throws a spurious XML_ERR_INVALID_CHAR?
,
May 17 2016
Yes it seems to me that it is something like that. I did pretty good testing that the problem is not in the XML file or on the server. It seems that XML parser gets the incomplete UTF-8 sequence while the XML file is streamed from the server.
,
May 17 2016
I wonder how no one noticed this before, I did not find single reference to this.
,
Jun 10 2016
,
Jun 13 2016
Tested the same on win8.1 and Linux 14.04 chrome version 51.0.2704.84 - Observed an error displayed on page load as shown in the screenshot. Could not reproduce the error on refreshing the page multiple times This error is not seen on latest beta 52.0.2743.33 dev 53.0.2763.0 and canary 53.0.2766.0 exander77@, Could you please recheck the same on latest builds and update the behavior.
,
Jul 29 2016
dominicc@, do you know if we updated libxml for M50?
,
Jul 29 2016
There were two rolls of libxml in May https://codereview.chromium.org/1994003003 https://codereview.chromium.org/2010803004 These should have been made to M53.
,
Jul 30 2016
,
Aug 1 2016
I'd need to spelunk logs to see exactly what changed in M50. There have been a spate of patches around these versions fixing security bugs. I could readily believe one of those broke decoding. Long term it would be good if XML parsing shared more infrastructure with Blink. Blink knows how to handle a stream of whatever encoding. Short term it would be handy to bisect this. It sounds like it depends on network packet boundaries; maybe someone could write a go server or Python server that flushes at the right time to make it reproduce reliably.
,
Aug 2 2016
No it's opposite -- it used be broken but no one can reproduce any further on ToT. So unless someone can repro, we can safely say you fixed this ;-) exander77@, it'd be great if you can confirm.
,
Aug 2 2016
Err, OK. Let me wontfix this as obsolete for now then.
,
Aug 2 2016
I had reliable repro in 51.0.2704.63 beta-m (64-bit). I upgraded to 53.0.2785.34 beta-m (64-bit) and am not able to repro any longer. I captured the original network read packet-size data and could build a Go app to replay the data if it would be valuable, but it looks like the bug in Chrome is gone. Either https://codereview.chromium.org/1994003003/diff/60001/third_party/libxml/src/parser.c or https://codereview.chromium.org/2010803004/diff/20001/third_party/libxml/src/parserInternals.c seems like the most likely candidate for the fix.
,
Aug 2 2016
Fixed https://bugzilla.gnome.org/show_bug.cgi?id=760183 matches the symptoms and timeline.
,
Aug 2 2016
I have Version 52.0.2743.82 (64-bit) and it seems OK. Great work. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by exande...@gmail.com
, May 17 2016