New issue
Advanced search Search tips

Issue 699319 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

mojo strings should require valid utf-8/utf-16

Project Member Reported by dcheng@chromium.org, Mar 8 2017

Issue description

We don't really do this today, but things like WTF::String do require valid UTF-8. We should try enabling this check universally, and encourage all binary data to be passed using something like array<uint8> instead.

The current alternative is random struct traits checking for utf-8 themselves, which seems suboptimal.

Alternatively, there's been a longstanding proposal to make it possible to enforce simple length constraints in mojom. This seems like it would be a natural fit there as well.
 
Components: -Internals>Mojo Internals>Mojo>Bindings
Generally I like the idea to enable utf8 check.
I will run some quick tests to see the perf impact and update this thread.



I wrote a perf test to compare string deserialization with/without utf8 check: https://codereview.chromium.org/2738643004

The following numbers were obtained with the following settings:
Z620; Linux; non-component release build.
Commandline: mojo_public_bindings_perftests --gtest_filter=*String*

DeserializeString_NoUtf8Check/8	2.03549e+07	times/second
DeserializeString_Utf8Check/8	1.55403e+07	times/second
DeserializeString_NoUtf8Check/128	2.0278e+07	times/second
DeserializeString_Utf8Check/128	3.01196e+06	times/second
DeserializeString_NoUtf8Check/1024	1.64714e+07	times/second
DeserializeString_Utf8Check/1024	446277	times/second

The check has quite big an impact as the length increase.
But I think typically long strings are used to transfer binary data. We should convert to array<uint8>.

I will send a mail to chromium-mojo@.
 Issue 900747  has been merged into this issue.
Status: Available (was: Untriaged)
Triage refresh. Still seems like a good idea.

We can also add a DCHECK-guarded validation step on send to catch mistakes earlier.

Comment 5 by pwnall@chromium.org, Jan 19 (4 days ago)

Cc: pwnall@chromium.org
Parallel that might be helpful -- protobuf has a "string" type for UTF-8, and a "bytes" type for binary data. https://developers.google.com/protocol-buffers/docs/proto?csw=1#scalar

Sign in to add a comment