|
|||||
Issue descriptionExample builds: https://ci.chromium.org/buildbot/chromium.win/Win7%20%2832%29%20Tests/32881 https://ci.chromium.org/buildbot/chromium.win/Win7%20%2832%29%20Tests/32835 [ RUN ] PPAPINaClNewlibTest.MessageHandler [1276:3884:0425/161655.734:ERROR:install_util.cc(589)] Unable to create registry key HKLM\SOFTWARE\Policies\Chromium for reading result=2 [1276:4668:0425/161655.785:WARNING:discovery_network_list_win.cc(195)] Failed to open Wlan client handle: 1062 [1276:3884:0425/161655.791:WARNING:chrome_browser_main_win.cc(630)] Command line too long for RegisterApplicationRestart: --brave-new-test-launcher --cfi-diag=0 --gtest_also_run_disabled_tests --gtest_filter=PPAPINaClNewlibTest.MessageHandler --single_process --test-launcher-bot-mode --test-launcher-output="C:\Users\CHROME~2\AppData\Local\Temp\scoped_dir3620_1573\results3620_13587\test_results.xml" --test-launcher-summary-output="e:\b\swarm_slave\w\ioknqhec\output.json" --user-data-dir="C:\Users\CHROME~2\AppData\Local\Temp\scoped_dir3620_1573\d3620_13709" --disable-offline-auto-reload --enable-pepper-testing --disable-smooth-scrolling --enable-nacl --allow-nacl-socket-api=127.0.0.1 --use-fake-device-for-media-stream --use-fake-ui-for-media-stream --no-first-run --no-default-browser-check --enable-logging=stderr --disable-default-apps --wm-window-animations-disabled --disable-component-update --test-type=browser --force-color-profile=srgb --disable-zero-browsers-open-for-tests --ipc-connection-timeout=30 --allow-file-access-from-files --dom-automation --log-gpu-control-list-decisions --disable-backgrounding-occluded-windows --override-use-software-gl-for-tests --force-color-profile=srgb --disable-compositor-ukm-for-tests --enable-features=TestFeatureForBrowserTest1 --disable-features=NetworkPrediction,TestFeatureForBrowserTest2 --flag-switches-begin --flag-switches-end --restore-last-session about:blank [1276:4428:0425/161656.120:WARNING:embedded_test_server.cc(229)] Request not handled. Returning 404: /favicon.ico [4488,3684:23:16:56.312000] Native Client module will be loaded at base address 0x000000002c5c0000 Received fatal exception EXCEPTION_ACCESS_VIOLATION Backtrace: NaClSwitchAVX [0x025F5FE1+33] [1276:3544:0425/161656.542:ERROR:nacl_process_host.cc(263)] NaCl process exited with status -1073741819 (0xc0000005) [1276:3884:0425/161656.554:INFO:CONSOLE(0)] "NativeClient: Nexe crashed during startup", source: http://127.0.0.1:50525/test_case.html?mode=nacl_newlib&testcase=MessageHandler (0) ../../chrome/test/ppapi/ppapi_test.cc(251): error: Expected equality of these values: "PASS" handler.message().c_str() Which is: "Plugin did not load. 'NaCl module load failed: Nexe crashed during startup'" Stack trace: Backtrace: StackTraceGetter::CurrentStackTrace [0x01911DC8+40] testing::internal::UnitTestImpl::CurrentOsStackTraceExceptTop [0x0191A78B+69] testing::internal::AssertHelper::operator= [0x0191A3F0+48] PPAPITestBase::RunTestURL [0x02EAE4AF+415] PPAPITestBase::RunTestViaHTTP [0x02EAE5CA+202] PPAPINaClPNaClTest_DISABLED_MessageHandler_Test::RunTestOnMainThread [0x013B1940+76] content::BrowserTestBase::ProxyRunTestOnMainThreadLoop [0x02ED934B+357] ChromeBrowserMainParts::PreMainMessageLoopRunImpl [0x048D79CE+3196] ChromeBrowserMainParts::PreMainMessageLoopRun [0x048D6C95+145] content::BrowserMainLoop::PreMainMessageLoopRun [0x020B7DC4+68] content::StartupTaskRunner::RunAllTasksNow [0x0232EE69+23] content::BrowserMainLoop::CreateStartupTasks [0x020B6EA9+615] content::BrowserMainRunnerImpl::Initialize [0x020B9B76+100] content::BrowserMain [0x020B545E+138] content::RunNamedProcessTypeMain [0x02DE63BF+123] content::ContentMainRunnerImpl::Run [0x02DE67EA+142] service_manager::Main [0x03325DB2+622] content::ContentMain [0x02DE62EB+51] content::BrowserTestBase::SetUp [0x02ED9124+1498] Apr 27 2018,Why didn't the CQ catch this? win7_chromium_rel_ng tests 32-bit builds on win7: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/155310 and the browser_tests step ran these tests successfully. And the build config is very similar: dcheck_always_on = true ffmpeg_branding = "Chrome" goma_dir = "C:\\b\\c\\goma_client" is_component_build = false is_debug = false proprietary_codecs = true strip_absolute_paths_from_debug_symbols = true symbol_level = 1 target_cpu = "x86" use_goma = true This is mysterious. May 1 2018,
Got it to reproduce on swarming. Progress! (pach in the lld switch) use gn args from #1 python tools/mb/mb.py isolate //out/release browser_tests python tools/swarming_client/isolate.py archive -I https://isolateserver.appspot.com -i out/release/browser_tests.isolate -s out/release/browser_tests.isolated set hash=b11a927c64c5220dc3baa7134d5a19ab00f045c7 python -u tools\swarming_client\swarming.py trigger --swarming https://chromium-swarm.appspot.com --isolate-server https://isolateserver.appspot.com --priority 25 --dimension cpu x86-32 --dimension gpu none --dimension os Windows-7-SP1 --dimension pool Chrome --named-cache swarming_module_cache_vpython .swarming_module_cache/vpython --idempotent --cipd-package ".swarming_module:infra/python/cpython/${platform}:version:2.7.14.chromium14" --cipd-package ".swarming_module:infra/tools/luci/logdog/butler/${platform}:git_revision:e1abc57be62d198b5c2f487bfb2fa2d2eb0e867c" --cipd-package ".swarming_module:infra/tools/luci/vpython-native/${platform}:git_revision:e1abc57be62d198b5c2f487bfb2fa2d2eb0e867c" --cipd-package ".swarming_module:infra/tools/luci/vpython/${platform}:git_revision:e1abc57be62d198b5c2f487bfb2fa2d2eb0e867c" --env-prefix PATH .swarming_module --env-prefix PATH .swarming_module/bin --env-prefix VPYTHON_VIRTUALENV_ROOT .swarming_module_cache/vpython --isolated %hash% -- --test-launcher-summary-output=${ISOLATED_OUTDIR}/output.json --gtest_filter=PPAPINaClNewlibTest.MessageHandler Example job: https://chromium-swarm.appspot.com/task?id=3d3611510dc3e210&refresh=10&show_raw=1 May 1 2018,I think (but I'm not very sure) the .nexe that's failing to load is ppapi_nacl_tests_newlib_x86_32.nexe Maybe that now gets built with lld and breaks somehow? No, it's still built with nacl own toolchain: c:/src/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe "../../build/toolchain/gcc_link_wrapper.py" --output="clang_newlib_x86/ppapi_nacl_tests_newlib_x86_32.nexe" --strip="../../native_client/toolchain/win_x86/pnacl_newlib/bin/x86_64-nacl-strip.exe" --unstripped-file="clang_newlib_x86/exe.unstripped/ppapi_nacl_tests_newlib_x86_32.nexe" -- ../../native_client/toolchain/win_x86/pnacl_newlib/bin/x86_64-nacl-clang++.exe -pthread -Wl,--fatal-warnings -m32 -Werror -Wl,-O2 -Wl,--gc-sections -o "clang_newlib_x86/exe.unstripped/ppapi_nacl_tests_newlib_x86_32.nexe" -Wl,--start-group @"clang_newlib_x86/ppapi_nacl_tests_newlib_x86_32.nexe.rsp" -Wl,--end-group May 1 2018,Aha! Closing in... The crash seems to happen in NaClSwitchAVX. If I comment out the call to __x86.get_pc_thunk.ax, the test passes: --- a/src/trusted/service_runtime/arch/x86_32/nacl_switch_32.S +++ b/src/trusted/service_runtime/arch/x86_32/nacl_switch_32.S @@ -65,9 +65,9 @@ DEFINE_GLOBAL_HIDDEN_LOCATION(\arg1): * and the performance cost of these two instructions after fxrstor64 * seems to be immeasurably small. */ - call __x86.get_pc_thunk.ax /* PIC support */ + /*call __x86.get_pc_thunk.ax*/ /* PIC support */ .if \arg2 == DO_AVX || \arg2 == DO_SSE - fxrstor fxrstor_default_state-.(%eax) + /*fxrstor fxrstor_default_state-.(%eax)*/ ldmxcsr NACL_THREAD_CONTEXT_OFFSET_MXCSR(%ecx) .else frstor frstor_default_state-.(%eax) A couple of lines below, __x86.get_pc_thunk.ax is defined. But looking at the somewhat convoluted preprocessor directives, it seems the thunk is never generated for Windows? But how does this even link, and how does it work when using link.exe? May 1 2018,Oh wait, I fail at reading preprocessor macros.. for Windows we fail into the #else case for NACL_LINUX and end up seeing .globl __x86.get_pc_thunk.ax __x86.get_pc_thunk.ax: movl (%esp), %eax ret That seems reasonable. Does lld end up putting it in a non-executable section or something? May 1 2018,From the /lldmap file of chrome.dll 00a20a8d 000000a0 1 nacl_win_as_x86/obj/native_client/src/trusted/service_runtime/sel_asm/nacl_switch_32.obj:(.text) 00a20a8d 00000000 0 _NaClSwitchNoSSE 00a20a8d 00000000 0 _NaClSwitchNoSSE 00a20aba 00000000 0 _NaClSwitchSSE 00a20aba 00000000 0 _NaClSwitchSSE 00a20aec 00000000 0 _NaClSwitchAVX 00a20aec 00000000 0 _NaClSwitchAVX 00a20b21 00000000 0 __x86.get_pc_thunk.ax 00a20b21 00000000 0 __x86.get_pc_thunk.ax It seems like it goes into .text all right.. May 1 2018,No! Shorter diff to make the test pass: --- a/src/trusted/service_runtime/arch/x86_32/nacl_switch_32.S +++ b/src/trusted/service_runtime/arch/x86_32/nacl_switch_32.S @@ -67,7 +67,7 @@ DEFINE_GLOBAL_HIDDEN_LOCATION(\arg1): */ call __x86.get_pc_thunk.ax /* PIC support */ .if \arg2 == DO_AVX || \arg2 == DO_SSE - fxrstor fxrstor_default_state-.(%eax) + /*fxrstor fxrstor_default_state-.(%eax)*/ __x86.get_pc_thunk.ax is fine, it's the store into fxrstor_default_state that fails. May 1 2018,The dumpbin looks like this: _NaClSwitchAVX: 0000005F: 58 pop eax 00000060: 31 C9 xor ecx,ecx 00000062: 59 pop ecx 00000063: 8B 51 64 mov edx,dword ptr [ecx+64h] 00000066: 8B 69 44 mov ebp,dword ptr [ecx+44h] 00000069: 8B 79 3C mov edi,dword ptr [ecx+3Ch] 0000006C: 8B 71 38 mov esi,dword ptr [ecx+38h] 0000006F: 8B 59 34 mov ebx,dword ptr [ecx+34h] 00000072: 8E 69 62 mov gs,word ptr [ecx+62h] 00000075: 8E 61 60 mov fs,word ptr [ecx+60h] 00000078: 8E 41 5E mov es,word ptr [ecx+5Eh] 0000007B: E8 14 00 00 00 call __x86.get_pc_thunk.ax 00000080: 0F AE 88 07 00 00 fxrstor .rdata[eax+7] <----- 00 00000087: 0F AE 51 54 ldmxcsr dword ptr [ecx+54h] 0000008B: D9 69 4E fldcw word ptr [ecx+4Eh] 0000008E: C5 F8 77 vzeroupper 00000091: FF 69 6C jmp fword ptr [ecx+6Ch] The relocation: Symbol Symbol Offset Type Applied To Index Name -------- ---------------- ----------------- -------- ----- 00000083 REL32 00000007 1D .rdata And some symbol table: 014 00000001 ABS notype Static | @feat.00 015 00000200 SECT4 notype Static | frstor_default_state 016 00000000 SECT4 notype Static | fxrstor_default_state 017 00000000 SECT1 notype Static | .text Section length 98, #relocs 3, #linenums 0, checksum 0 019 00000000 SECT2 notype Static | .data Section length 0, #relocs 0, #linenums 0, checksum 0 01B 00000000 SECT3 notype Static | .bss Section length 0, #relocs 0, #linenums 0, checksum 0 01D 00000000 SECT4 notype Static | .rdata Section length 26C, #relocs 0, #linenums 0, checksum 0 01F 00000000 SECT1 notype External | _NaClSwitchNoSSE 020 00000094 SECT1 notype External | __x86.get_pc_thunk.ax 021 0000002D SECT1 notype External | _NaClSwitchSSE 022 0000005F SECT1 notype External | _NaClSwitchAVX I'm not entirely sure how this is supposed to work. May 1 2018,> it's the store into fxrstor_default_state that fails. It's reading from fxrstor_default_state, not writing to it of course. May 2 2018,Nico pointed out that fxrstor expects the memory address to be 16-byte aligned, and maybe that's where we're failing... from the lldmap file it doesn't look like it's getting aligned: 0236bf7c 00000280 1 nacl_win_as_x86/obj/native_client/src/trusted/service_runtime/sel_asm/nacl_switch_32.obj:(.rdata) 0236bf7c 00000000 0 fxrstor_default_state 0236c17c 00000000 0 frstor_default_state 0236bf7c is not a multiple of 16. May 2 2018,Attaching the object file for convenience. fxrstor_default_state is prefixed with ".balign 64" in the source, but from looking at the object file it's not clear to me how the alignment is tracked there. May 2 2018,From the link.exe /map file: Address Publics by Value Rva+Base Lib:Object 0002:000e1554 ?intervals@?1??OnPaintLayer@OverlayAgentAura@ui_devtools@@EAEXABVPaintContext@ui@@@Z@3QBMB 123cc554 overlay_agent_aura.obj 0002:000e1920 ??_7DebugExceptionHandler@?A@@6B@ 123cc920 nacl_debug_exception_handler_win.obj 0002:000e1b20 fxrstor_default_state 123ccb20 nacl_switch_32.obj 0002:000e1d20 frstor_default_state 123ccd20 nacl_switch_32.obj That doesn't look 64-byte aligned either? Now I'm even more confused. May 2 2018,> That doesn't look 64-byte aligned either? Now I'm even more confused. D'oh! The requirement is for 16-byte alignment, and 000e1b20 is certainly that. But the code does ".balign 64"... Anyway, 16-byte alignment is enough for FXRSTOR May 2 2018,Theory: link.exe always 16-byte aligns subsections? May 2 2018,> Theory: link.exe always 16-byte aligns subsections? No it doesn't do that. But perhaps it 16-byte aligns when no alignment is specified, as is the case with .rdata in this file... May 2 2018,From include/llvm/Object/COFF.h, coff_section::getAlignment() // Bit [20:24] contains section alignment. Both 0 and 1 mean alignment 1. uint32_t Shift = (Characteristics >> 20) & 0xF; if (Shift > 0) return 1U << (Shift - 1); return 1; I'm not sure "0 and 1 mean alignment 1" is true. Rather I suspect 0 means "alignment not specified" and that link.exe 16-byte aligns in that case. I'll try to come up with a test for this.. May 2 2018,No, that doesn't seem to be it either. My example test case: C:\src\tmp>cat a.c char foo = 1; extern char bar; int main() { return &foo - &bar; }; C:\src\tmp>cat b.c char bar = 2; cl -c a.c b.c && editbin /section:.data,ax b.obj && link a.obj b.obj /map:map.txt Looking in map.txt, bar really ends up 1-byte aligned :-( Still missing something I guess. May 2 2018,From the dumpbin of SECTION HEADER #4 .rdata name 0 physical address 0 virtual address 280 size of raw data 154 file pointer to raw data (00000154 to 000003D3) 0 file pointer to relocation table 0 file pointer to line numbers 0 number of relocations 0 number of line numbers 40000040 flags Initialized Data (no align specified) Read Only 015 00000200 SECT4 notype Static | frstor_default_state 016 00000000 SECT4 notype Static | fxrstor_default_state I don't see anything here that would cause fxrstor_default_state to become 16-byte aligned :-/ May 4 2018,
I think the default alignment is based on the section size. I wrote a script that tried every section size from 1 to 256, and it told me that the rules are: Size Alignment 1 1 2..7 4 8..63 8 64.. 16 The .rdata in your file is 640 bytes, which would explain how it gets alignment 16. May 4 2018,Wait, no, that's cl.exe's rule for setting section alignment based on object size. If I strip alignments from all sections, I see that the default alignment is always 16. Here's the script that I used (with your a.obj): echo '@echo off' for i in `seq 1 256`; do echo 'echo '$i echo 'echo char bar['$i'] = {2}; > b.c' echo 'cl /nologo /c b.c' echo '..\l\\ra\\bin\obj2yaml b.obj | wsl grep -v Alignment: | ..\l\\ra\\bin\yaml2obj > b2.obj' echo 'wsl mv b2.obj b.obj' echo 'link /nologo a.obj b.obj /map:a.map' echo 'wsl grep -A1 foo a.map' done Should be an easy fix then. May 4 2018,Not as easy as I thought, as I needed to fix a couple of bugs first. https://reviews.llvm.org/D46418 https://reviews.llvm.org/D46419 https://reviews.llvm.org/D46420 May 4 2018,> If I strip alignments from all sections, I see that the default alignment is always 16. I'm still missing something. When in my test I strip the alignment flag from b.obj's .data section, why is "bar" showing up 1-byte aligned then? cl -c a.c b.c && editbin /section:.data,ax b.obj && link a.obj b.obj /map:map.txt && grep -A1 foo map.txt 0003:00000000 foo 0000000140016000 a.obj 0003:00000001 bar 0000000140016001 b.obj I also tried stripping alignment from all sections in b.obj, which should match what your script is doing: cl -c a.c b.c && editbin /section:.data,ax b.obj && editbin /section:.debug$S,ax b.obj && editbin /section:.drectve,ax b.obj && editbin /section:.chks64,ax b.obj && link a.obj b.obj /map:map.txt && grep -A1 foo map.txt 0003:00000000 foo 0000000140016000 a.obj 0003:00000001 bar 0000000140016001 b.obj My a.c: char foo = 1; extern char bar[]; int main() { return &foo - bar; }; b.c: char bar[640] = {1,2,3,4,5}; May 4 2018,A reduced test case based on nacl_switch_32.S. (Linking with /dll /noentry reduces the output, but it's the same when linking a .exe.) a.c: static const char foo = 1; int main() { return 0; } b.s: .section .rdata, "r" _bar: .byte 42 \src\chromium\src\third_party\gnu_binutils\files\as.exe --32 -o b.obj b.s && cl /nologo -c a.c && link /nologo a.obj b.obj /map:map.txt /dll /noentry && grep -A1 _foo map.txt a.c 0002:00000000 _foo 10002000 a.obj 0002:00000010 _bar 10002010 b.obj May 4 2018,So, what's the difference between b.obj when built from b.s and with b.c as earlier? One difference is that the C version gets the IMAGE_SCN_TYPE_NO_PAD bit set: - Name: .rdata Characteristics: [ IMAGE_SCN_TYPE_NO_PAD, IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] The asm version doesn't get that: - Name: .rdata Characteristics: [ IMAGE_SCN_CNT_INITIALIZED_DATA, IMAGE_SCN_MEM_READ ] Maybe that explains it.. maybe 16-byte alignment is the default, but the file I'm testing with has IMAGE_SCN_TYPE_NO_PAD which implies 1-byte alignment? May 4 2018,Yes, I think that's it: a.c: static const char foo = 1; b.c: static const char bar = 42; cl -c a.c b.c editbin /section:.rdata,ax b.obj (manually hexedit out IMAGE_SCN_TYPE_NO_PAD from b.obj, replacing "4800 0040" with "4000 0040") link a.obj b.obj /map:map.txt /dll /noentry /nologo && grep -A1 _foo map.txt 0001:00000000 _foo 10001000 a.obj 0001:00000010 _bar 10001010 b.obj Okay, I'm convinced; 16-byte alignment appears to be the default :-) May 4 2018,Does llvm always set IMAGE_SCN_TYPE_NO_PAD on its outputs, or is that dependent on anything? Same question for gas -- if there's a way to accidentally grow that bit with gas, that'd be bad for that file. It'd be nice if we could somehow make that .s file produce an .obj that doesn't rely on the default alignment (but fixing that too is important as well). Since the asan start symbol thing was from a c file, it's a different root cause after all, right? May 4 2018,Neither cl nor llvm sets the NO_PAD bit, it seems it was editbin that put it there. So much for "ax" meaning "no alignment" :-( It seems unlikely that gas would start setting the bit, and it would probably break all kinds of things as alignment would be off everywhere. But it would certainly be nice if we could fix the .s file to request an explicit alignment somehow. I'll give it a try. Yes, different root cause from the asan bug ( crbug.com/837090 ) but at least the topic of alignment was common :-) May 4 2018,> But it would certainly be nice if we could fix the .s file to request an explicit alignment somehow. I'll give it a try. I think this is what we want: https://sourceware.org/ml/binutils/2010-01/msg00546.html Unfortunately that's too new for the gas version we're using: \src\chromium\src\third_party\gnu_binutils\files\as.exe -v GNU assembler version 2.17.50 (i686-pc-cygwin) using BFD version 2.17.50 20060817 I suppose we can update the comment in nacl_switch_32.S to mention this relies on the default section alignment being >= 16 bytes, but I'm not sure it's worth doing more than that. May 4 2018,Yup, sounds good. So we need a clang roll, and then we're all set from a functionality PoV, and the comment in nacl land for cleanliness. May 4 2018,The alignment fix landed in r331538. May 4 2018,
May 4 2018,Verified the test passes with ToT lld & clang here: https://chromium-swarm.appspot.com/task?id=3d4563c259822110&refresh=10&show_raw=1 May 7 2018,Adding the comment to NaCl here: https://chromium-review.googlesource.com/c/native_client/src/native_client/+/1046652 May 10 2018,
May 11 2018, Project MemberThe following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/59bcf50b03d8292f0155917f7908777c2dcdcdd7 commit 59bcf50b03d8292f0155917f7908777c2dcdcdd7 Author: nacl-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <nacl-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Fri May 11 01:49:45 2018 Roll src/native_client/ ab8b219c8..f9564e9d1 (1 commit) https://chromium.googlesource.com/native_client/src/native_client.git/+log/ab8b219c8b55..f9564e9d1b15 $ git log ab8b219c8..f9564e9d1 --date=short --no-merges --format='%ad %ae %s' 2018-05-09 hans nacl_switch_32.S: Add comment about alignment on Windows Created with: roll-dep src/native_client BUG= chromium:837667 The AutoRoll server is located here: https://nacl-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=mseaborn@chromium.org Change-Id: I9696bb7f656b4b299a22d36c00952c6665cd4ba5 Reviewed-on: https://chromium-review.googlesource.com/1054373 Reviewed-by: nacl-chromium-autoroll <nacl-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: nacl-chromium-autoroll <nacl-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#557761} [modify] https://crrev.com/59bcf50b03d8292f0155917f7908777c2dcdcdd7/DEPS |
|||||
►
Sign in to add a comment |
Comment 1 by h...@chromium.org, Apr 27 2018