New issue
Advanced search Search tips

Issue 857442 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 860622

Blocking:
issue webrtc:9442
issue 856761



Sign in to add a comment

On Windows, the compiler sometimes fills trailing char array elements with garbage instead of zeros

Project Member Reported by kwiberg@chromium.org, Jun 28 2018

Issue description

What steps will reproduce the problem?

  (1) Build and run the tiny executable from this CL: https://chromium-review.googlesource.com/c/chromium/src/+/1118170

What is the expected result?

  67 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  This is what I get on Linux, and when I make supposed no-op modifications to the code as indicated by the comments in that CL.

What happens instead?

  67 78 0 37 115 10 0 37 100 32 0 10 0 0 0 0 64 80 0 64 1 0 0 0 -32 80 0 64 1 0 0 0 

  This is what I get when I run the CL unmodified on Windows.

This problem was initially reported here: https://bugs.chromium.org/p/chromium/issues/detail?id=856761. That's an ASan build, and the effect there is that ASan complains when the code tries to read 32 bytes from a 3-byte constant.

I've used the original bug's high priority and "Bug-Security" label in this bug, because not reliably zeroing out array elements that are supposed to be zeroed out doesn't strike me as safe. Please downgrade if appropriate.


Speculation
===========

This is my theory of what's happening:

1. The compiler generates code to copy 32 bytes from the address where the 32-byte constant {'C', 'N', '\0', '\0', ..., '\0'} is stored. It also emits that constant when compiling kwiberg_test.cc.

2. When compiling kwiberg_test_2.cc, the compiler emits the 3-byte constant {'C', 'N', '\0'}.

3. The linker decides to deduplicate these two constants, not realizing that the extra 29 trailing zeros are significant. Depending on the order the files are given to the linker, either the 3-byte or the 32-byte constant ends up in the final executable.

I don't actually know anything about compilers, though, so this may be wrong.
 
Actually, the bug was independently discovered here too, a bit earlier: https://bugs.chromium.org/p/webrtc/issues/detail?id=9442

Comment 2 by oprypin@webrtc.org, Jun 28 2018

Blocking: webrtc:9442
Blocking: 856761

Comment 4 by thakis@chromium.org, Jun 28 2018

Cc: h...@chromium.org p...@chromium.org
Thanks for the excellent repro! This does look fishy.

Comment 5 by h...@chromium.org, Jun 29 2018

Cc: thakis@chromium.org
Owner: h...@chromium.org
Status: Assigned (was: Untriaged)
Interesting.

Stand-alone repro based on kwiberg's code:

a.cc

#include <stdio.h>

int main(int argc, char **) {
  struct {int x; char s[32]; } s = {argc, "foo"};

  for (int i = 0; i < 32; ++i)
    printf("%d ", s.s[i]);
  printf("\n");

  return 0;
}


b.cc

const char *foo() { return "foo"; }


clang-cl \src\tmp\b.cc \src\tmp\a.cc /Fefoo.exe && foo.exe
102 111 111 0 37 100 32 0 10 0 0 0 0 0 0 0 80 -55 -22 56 -9 127 0 0 -16 -55 -22 56 -9 127 0 0

Note that this uses link.exe, so it's not an lld thing.

Comment 6 by thakis@chromium.org, Jun 29 2018

Does it happen with lld as linker?

Does it happen with cl as compiler? (Probably not?)

Comment 7 by h...@chromium.org, Jun 29 2018

Status: Started (was: Assigned)
Yes, happens with lld, no I don't see it with cl.


I think it's a mangling thing!

From the IR for b.cc:

$"??_C@_03GBBIHDEJ@foo?$AA@" = comdat any

@"??_C@_03GBBIHDEJ@foo?$AA@" = linkonce_odr dso_local unnamed_addr constant [4 x i8] c"foo\00", comdat, align 1

And for a.cc:

$"??_C@_03GBBIHDEJ@foo?$AA@" = comdat any
@"??_C@_03GBBIHDEJ@foo?$AA@" = linkonce_odr dso_local unnamed_addr constant [32 x i8] c"foo\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", comdat, align 1

They have the same mangled name, oops!

Comment 8 by h...@chromium.org, Jun 29 2018

The mangler (MicrosoftMangleContextImpl::mangleStringLiteral) is using StringLiteral::getByteLength(), but StringLiteral is sneaky, because the actual string can be both shorter and longer than the literal, as mentioned in this comment:

 /// Strings in C can also be truncated and extended by assigning into arrays,
 /// e.g. with constructs like:
 ///   char X[2] = "foobar";
 /// In this case, getByteLength() will return 6, but the string literal will
 /// have type "char[2]".

I think we have to look at the StringLiteral's getType() to figure out the real length.

Comment 9 by thakis@chromium.org, Jun 29 2018

This might be the codegen code that does this:

http://llvm-cs.pcc.me.uk/tools/clang/lib/CodeGen/CodeGenModule.cpp#4041

    // Resize the string to the right size, which is indicated by its type.
    const ConstantArrayType *CAT = Context.getAsConstantArrayType(E->getType());
    Str.resize(CAT->getSize().getZExtValue());


Looks like this is only done for string literals with elt size 1, so maybe codegen gets this wrong for wide / u / U string literals too?
Cc: zstein@chromium.org
> Looks like this is only done for string literals with elt size 1, so maybe codegen gets this wrong for wide / u / U string literals too?

No, I think codegen for those is okay, they each have an Elements.resize() to the length based on the ArrayType) (this is in CodeGenModule::GetConstantArrayFromStringLiteral).
Blockedon: 860622
The mangling fix landed in Clang r336415. We need a roll to bring that into Chromium.
Status: Fixed (was: Started)

Sign in to add a comment