New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user

Issue metadata

Status: Fixed
Closed: Dec 14
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Sign in to add a comment

Issue 914963: DevTools backend sends binary WebSocketFrames as UTF-8 strings

Reported by, Dec 13 Project Member

Issue description

The backend currently interprets binary websocket messages as UTF-8 or Latin1 for invalid UTF-8 sequences. This causes ambiguity between valid UTF-8 sequences representing Latin1 characters and invalid UTF-8 sequences interpreted as Latin1 characters.
For example: 0xc3b1 is UTF-8 for ñ, and 0xf1 is invalid UTF-8 but is the same ñ character in Latin1.
When a binary websocket message is received which has either of those sequences, the corresponding protocol message will just be "ñ", which doesn't tell us if it came from 0xf1 or 0xc3b1.

Base64 encoding the payload string on the backend for binary messages will fix this problem. Keeping them the same for text messages is OK because the websocket rfc specifies that the contents of text messages must be valid UTF-8, and if they aren't, the connection should be closed.

johannes@ suggested adding a bool to the WebSocketFrame type indicating whether or not the payloadData is base64 encoded, but now that I know that text messages are always valid UTF-8, I think we will be fine instead with a policy that text messages are unicode strings and binary messages are strings of base64 encoded data.

Comment 2 by, Dec 14

Status: Fixed (was: Assigned)

Sign in to add a comment