On Windows, the compiler sometimes fills trailing char array elements with garbage instead of zeros |
||||||||
Issue descriptionWhat steps will reproduce the problem? (1) Build and run the tiny executable from this CL: https://chromium-review.googlesource.com/c/chromium/src/+/1118170 What is the expected result? 67 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 This is what I get on Linux, and when I make supposed no-op modifications to the code as indicated by the comments in that CL. What happens instead? 67 78 0 37 115 10 0 37 100 32 0 10 0 0 0 0 64 80 0 64 1 0 0 0 -32 80 0 64 1 0 0 0 This is what I get when I run the CL unmodified on Windows. This problem was initially reported here: https://bugs.chromium.org/p/chromium/issues/detail?id=856761. That's an ASan build, and the effect there is that ASan complains when the code tries to read 32 bytes from a 3-byte constant. I've used the original bug's high priority and "Bug-Security" label in this bug, because not reliably zeroing out array elements that are supposed to be zeroed out doesn't strike me as safe. Please downgrade if appropriate. Speculation =========== This is my theory of what's happening: 1. The compiler generates code to copy 32 bytes from the address where the 32-byte constant {'C', 'N', '\0', '\0', ..., '\0'} is stored. It also emits that constant when compiling kwiberg_test.cc. 2. When compiling kwiberg_test_2.cc, the compiler emits the 3-byte constant {'C', 'N', '\0'}. 3. The linker decides to deduplicate these two constants, not realizing that the extra 29 trailing zeros are significant. Depending on the order the files are given to the linker, either the 3-byte or the 32-byte constant ends up in the final executable. I don't actually know anything about compilers, though, so this may be wrong.
,
Jun 28 2018
,
Jun 28 2018
,
Jun 28 2018
Thanks for the excellent repro! This does look fishy.
,
Jun 29 2018
Interesting.
Stand-alone repro based on kwiberg's code:
a.cc
#include <stdio.h>
int main(int argc, char **) {
struct {int x; char s[32]; } s = {argc, "foo"};
for (int i = 0; i < 32; ++i)
printf("%d ", s.s[i]);
printf("\n");
return 0;
}
b.cc
const char *foo() { return "foo"; }
clang-cl \src\tmp\b.cc \src\tmp\a.cc /Fefoo.exe && foo.exe
102 111 111 0 37 100 32 0 10 0 0 0 0 0 0 0 80 -55 -22 56 -9 127 0 0 -16 -55 -22 56 -9 127 0 0
Note that this uses link.exe, so it's not an lld thing.
,
Jun 29 2018
Does it happen with lld as linker? Does it happen with cl as compiler? (Probably not?)
,
Jun 29 2018
Yes, happens with lld, no I don't see it with cl. I think it's a mangling thing! From the IR for b.cc: $"??_C@_03GBBIHDEJ@foo?$AA@" = comdat any @"??_C@_03GBBIHDEJ@foo?$AA@" = linkonce_odr dso_local unnamed_addr constant [4 x i8] c"foo\00", comdat, align 1 And for a.cc: $"??_C@_03GBBIHDEJ@foo?$AA@" = comdat any @"??_C@_03GBBIHDEJ@foo?$AA@" = linkonce_odr dso_local unnamed_addr constant [32 x i8] c"foo\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", comdat, align 1 They have the same mangled name, oops!
,
Jun 29 2018
The mangler (MicrosoftMangleContextImpl::mangleStringLiteral) is using StringLiteral::getByteLength(), but StringLiteral is sneaky, because the actual string can be both shorter and longer than the literal, as mentioned in this comment: /// Strings in C can also be truncated and extended by assigning into arrays, /// e.g. with constructs like: /// char X[2] = "foobar"; /// In this case, getByteLength() will return 6, but the string literal will /// have type "char[2]". I think we have to look at the StringLiteral's getType() to figure out the real length.
,
Jun 29 2018
This might be the codegen code that does this: http://llvm-cs.pcc.me.uk/tools/clang/lib/CodeGen/CodeGenModule.cpp#4041 // Resize the string to the right size, which is indicated by its type. const ConstantArrayType *CAT = Context.getAsConstantArrayType(E->getType()); Str.resize(CAT->getSize().getZExtValue()); Looks like this is only done for string literals with elt size 1, so maybe codegen gets this wrong for wide / u / U string literals too?
,
Jul 2
,
Jul 4
,
Jul 4
> Looks like this is only done for string literals with elt size 1, so maybe codegen gets this wrong for wide / u / U string literals too? No, I think codegen for those is okay, they each have an Elements.resize() to the length based on the ArrayType) (this is in CodeGenModule::GetConstantArrayFromStringLiteral).
,
Jul 13
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by kwiberg@chromium.org
, Jun 28 2018