New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 612430 link

Starred by 5 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows
Pri: 2
Type: Bug



Sign in to add a comment

XML - random parsing problem: Input is not proper UTF-8, indicate encoding!

Reported by exande...@gmail.com, May 17 2016

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36

Example URL:
http://t4d.cz/scrap/m-hunt.sk.xml

Steps to reproduce the problem:
I am creating several XML feeds, for example:

1. Go to: http://t4d.cz/scrap/m-hunt.sk.xml
2. You get error: 
error on line 591 at column 554: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC3 0xA1 0x62 0x61
3. Refresh:
error on line 6098 at column 896: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC3 0xBD 0x6C 0x65
4. Refresh:
error on line 3927 at column 533: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC3 0xBD 0x6D 0x69
5. Refresh:
error on line 6098 at column 896: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC3 0xBD 0x6C 0x65
6. Refresh:
error on line 591 at column 554: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC3 0xA1 0x62 0x61
7. Save the XML.
8. Open saved XML - no error at all.
9. Sometimes the feed shows no error at all if you refresh it.
10. Go to: http://t4d.cz/scrap/vo.pyra.eu.xml
11. Refresh - shows the error very rarely.

You get randomly an error, usually there are a few places where you get the error.

What is the expected behavior?
XML is OK and so there should be no errors at all.

What went wrong?
XML shows not proper UTF-8 error on several places randomly or no error at all.

Does it occur on multiple sites: Yes

Is it a problem with a plugin? No 

Did this work before? N/A 

Does this work in other browsers? Yes 

Chrome version: 50.0.2661.94  Channel: n/a
OS Version: Ubuntu 16.04
Flash Version: Shockwave Flash 21.0 r0
 

Comment 1 by exande...@gmail.com, May 17 2016

Seem that the problem also occur on Windows. The larger the XML is the more probable the error is. I think I already eliminated the possibility that the problem is in XML itself - saved document works fine, even if it showed error. 
Components: -Blink Blink>XML
Labels: OS-Windows
Status: Untriaged (was: Unconfirmed)
Summary: XML - random parsing problem: Input is not proper UTF-8, indicate encoding! (was: XML - random parsing problem: Input is not proper UTF-8, indicate encoding !)
I can reproduce this on Windows 51.0.2704.47 beta-m (64-bit) as well. The server sends 

    Content-Type: text/xml; charset=utf-8

And the XML begins <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

If I use Fiddler to completely buffer the response (instead of streaming it to the client), Chrome shows no error in the parsing and treats the document as valid XML.

Maybe this is a case where the streaming libxml parser reads an incomplete UTF-8 sequence and throws a spurious XML_ERR_INVALID_CHAR?

Comment 3 by exande...@gmail.com, May 17 2016

Yes it seems to me that it is something like that. I did pretty good testing that the problem is not in the XML file or on the server. It seems that XML parser gets the incomplete UTF-8 sequence while the XML file is streamed from the server.

Comment 4 by exande...@gmail.com, May 17 2016

I wonder how no one noticed this before, I did not find single reference to this.

Comment 5 by tkent@chromium.org, Jun 10 2016

Labels: Needs-Bisect
Cc: tkonch...@chromium.org
Labels: Needs-Feedback
Tested the same on win8.1 and Linux 14.04 chrome version 51.0.2704.84 - Observed an error displayed on page load as shown in the screenshot. Could not reproduce the error on refreshing the page multiple times

This error is not seen on latest beta 52.0.2743.33 dev 53.0.2763.0 and canary 53.0.2766.0

exander77@, Could you please recheck the same on latest builds and update the behavior. 
Error.png
470 KB View Download

Comment 7 by tkent@chromium.org, Jul 29 2016

Cc: dominicc@chromium.org
dominicc@, do you know if we updated libxml for M50?

Comment 8 by kojii@chromium.org, Jul 29 2016

Cc: kojii@chromium.org
There were two rolls of libxml in May
https://codereview.chromium.org/1994003003
https://codereview.chromium.org/2010803004

These should have been made to M53.

Comment 9 by kojii@chromium.org, Jul 30 2016

Cc: ranjitkan@chromium.org
 Issue 614677  has been merged into this issue.
Owner: dominicc@chromium.org
I'd need to spelunk logs to see exactly what changed in M50. There have been a spate of patches around these versions fixing security bugs. I could readily believe one of those broke decoding.

Long term it would be good if XML parsing shared more infrastructure with Blink. Blink knows how to handle a stream of whatever encoding.

Short term it would be handy to bisect this. It sounds like it depends on network packet boundaries; maybe someone could write a go server or Python server that flushes at the right time to make it reproduce reliably.
No it's opposite -- it used be broken but no one can reproduce any further on ToT.

So unless someone can repro, we can safely say you fixed this ;-)

exander77@, it'd be great if you can confirm.
Status: WontFix (was: Untriaged)
Err, OK. Let me wontfix this as obsolete for now then.
I had reliable repro in 51.0.2704.63 beta-m (64-bit). I upgraded to 53.0.2785.34 beta-m (64-bit) and am not able to repro any longer. I captured the original network read packet-size data and could build a Go app to replay the data if it would be valuable, but it looks like the bug in Chrome is gone.

Either https://codereview.chromium.org/1994003003/diff/60001/third_party/libxml/src/parser.c or https://codereview.chromium.org/2010803004/diff/20001/third_party/libxml/src/parserInternals.c seems like the most likely candidate for the fix.
Fixed https://bugzilla.gnome.org/show_bug.cgi?id=760183 matches the symptoms and timeline.
I have Version 52.0.2743.82 (64-bit) and it seems OK. Great work.

Sign in to add a comment