Issue metadata
Sign in to add a comment
|
Flaky crash in v8::internal::RememberedSetUpdatingItem::CheckAndUpdateOldToNewSlot during webgl_conformance_tests on Win10 Debug (NVIDIA) |
||||||||||||||||||||||
Issue descriptionIn https://build.chromium.org/p/chromium.gpu.fyi/builders/Win10%20Debug%20%28NVIDIA%29/builds/1846 WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_057_to_064 failed during webgl_conformance_d3d11_passthrough_tests WebglConformance_conformance_ogles_GL_swizzlers_swizzlers_017_to_024 failed during webgl_conformance_tests Same crash in both: v8::internal::RememberedSetUpdatingItem<v8::internal::MajorNonAtomicMarkingState>::CheckAndUpdateOldToNewSlot<1> [0x189DBC67+39] v8::internal::SlotSet::Iterate<<lambda_ad1b48bfd15399d01548b8de55983cc2> > [0x189DE8B9+121] v8::internal::RememberedSetUpdatingItem<v8::internal::MajorNonAtomicMarkingState>::UpdateUntypedPointers [0x189FE92A+58] v8::internal::RememberedSetUpdatingItem<v8::internal::MajorNonAtomicMarkingState>::Process [0x189F9179+25] v8::internal::PointersUpatingTask::RunInParallel [0x189FC51A+26] v8::internal::ItemParallelJob::Task::RunInternal [0x189B7E88+8] v8::internal::ItemParallelJob::Run [0x189B7C60+256] v8::internal::MarkCompactCollector::UpdatePointersAfterEvacuation [0x189FDB1F+1119] v8::internal::MarkCompactCollector::Evacuate [0x189F080B+603] v8::internal::MarkCompactCollector::CollectGarbage [0x189EDD3B+219] v8::internal::Heap::MarkCompact [0x189AFA75+133] v8::internal::Heap::PerformGarbageCollection [0x189B1794+644] v8::internal::Heap::CollectGarbage [0x189A02EE+478] v8::internal::Heap::FinalizeIncrementalMarkingIfComplete [0x189A904E+350] v8::internal::IncrementalMarkingJob::Task::RunInternal [0x189CADB5+261] v8::internal::CancelableTask::Run [0x1846A6F3+51] ??$Invoke@PAVTask@v8@@$$V@?$FunctorTraits@P8Task@v8@@AEXXZX@internal@base@@SAXP8Task@v8@@AEXXZ$$QAPAV34@@Z [0x183D8E0B+11] base::internal::InvokeHelper<0,void>::MakeItSo<void (__thiscall v8::Task::*const &)(void),v8::Task *> [0x183D8EE4+36] base::internal::Invoker<base::internal::BindState<void (__thiscall v8::Task::*)(void),base::internal::OwnedWrapper<v8::Task> >,void __cdecl(void)>::RunImpl<void (__thiscall v8::Task::*const &)(void),std::tuple<base::internal::OwnedWrapper<v8::Task> > cons [0x183D9099+137] base::internal::Invoker<base::internal::BindState<void (__thiscall v8::Task::*)(void),base::internal::OwnedWrapper<v8::Task> >,void __cdecl(void)>::Run [0x183DDB04+36] base::OnceCallback<void __cdecl(void)>::Run [0x1004BB25+53] base::debug::TaskAnnotator::RunTask [0x100B6AA7+519] blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue [0x1C674008+1400] blink::scheduler::TaskQueueManager::DoWork [0x1C672109+1049] base::internal::FunctorTraits<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),void>::Invoke<base::WeakPtr<blink::scheduler::TaskQueueManager> const &,bool const &> [0x1C663FF5+37] base::internal::InvokeHelper<1,void>::MakeItSo<void (__thiscall blink::scheduler::TaskQueueManager::*const &)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager> const &,bool const &> [0x1C664256+70] base::internal::Invoker<base::internal::BindState<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager>,bool>,void __cdecl(void)>::RunImpl<void (__thiscall blink::scheduler::TaskQueueManager::*cons [0x1C6643C0+160] base::internal::Invoker<base::internal::BindState<void (__thiscall blink::scheduler::TaskQueueManager::*)(bool),base::WeakPtr<blink::scheduler::TaskQueueManager>,bool>,void __cdecl(void)>::Run [0x1C674A94+36] base::OnceCallback<void __cdecl(void)>::Run [0x1004BB25+53] base::debug::TaskAnnotator::RunTask [0x100B6AA7+519] base::internal::IncomingTaskQueue::RunTask [0x1012E825+37] base::MessageLoop::RunTask [0x10138320+512] base::MessageLoop::DeferOrRunPendingTask [0x10136972+50] base::MessageLoop::DoWork [0x10136FE6+278] base::MessagePumpDefault::Run [0x1013D148+40] base::MessageLoop::Run [0x1013801F+191] base::RunLoop::Run [0x101FC76A+186] content::RendererMain [0x136085EA+730] content::RunNamedProcessTypeMain [0x13B290F7+135] content::ContentMainRunnerImpl::Run [0x13B28FCE+414] content::ContentServiceManagerMainDelegate::RunEmbedderProcess [0x13B26B74+36] service_manager::Main [0x0C3C9157+823] content::ContentMain [0x13B270C9+41] ChromeMain [0x04592CB5+277] MainDllLoader::Launch [0x00432544+836] wWinMain [0x0042D2AB+747] invoke_main [0x004F278E+30] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:118) __scrt_common_main_seh [0x004F25F0+336] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253) __scrt_common_main [0x004F248D+13] (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:296) wWinMainCRTStartup [0x004F27A8+8] (f:\dd\vctools\crt\vcstartup\src\startup\exe_wwinmain.cpp:17) BaseThreadInitThunk [0x763238F4+36] RtlUnicodeStringToInteger [0x77975DE3+595] RtlUnicodeStringToInteger [0x77975DAE+542] Two v8 rolls in blamelist: https://chromium-review.googlesource.com/651491 https://chromium-review.googlesource.com/652106 Will continue observing to see if all WebglConformance_conformance_ogles_GL_swizzlers_swizzlers need to be disabled.
,
Sep 6 2017
V8 folks: who can take this? It's an urgent flaky regression. Thanks.
,
Sep 7 2017
+ mem sheriff, primary and secondary
,
Sep 11 2017
Looks like the result of heap corruption somewhere, e.g. missing write barrier or mishandling of a raw pointer into the heap. Crashes look similar to these, although may or may not have the same root cause: https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%20%27v8%3A%3Ainternal%3A%3ARememberedSetUpdatingItem%3A%3ACheckAndUpdateOldToNewSlot%25%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest=#-property-selector,samplereports:5,+productversion,cpuarchitecture,country
,
Sep 11 2017
Guessing at a possible candidate: [heap] Avoid fences during pointer updating Reviewed-on: https://chromium-review.googlesource.com/539642 Touches the related area, and committed shortly before the first crashes with this signature.
,
Sep 11 2017
Reassigning to ulan@ as per request. PTAL, any ideas?
,
Sep 14 2017
Back from vacation. I agree with #4. Looks like heap corruption, hard to say more without local repro. The crash did not occur in the last ~40 builds after 1846, so there is a chance that the regression was fixed in the meantime.
,
Sep 14 2017
OK. Feel free to close as WontFix if not reproducible. Thanks.
,
Sep 14 2017
,
Sep 15 2017
Thanks, ynovikov@. Log contains "Found Minidump: True". Does this mean that the minidump is available somewhere? If so, it would be very useful.
,
Sep 15 2017
The log says: Minidump found: c:\b\s\w\itinbguh\tmpxs_0to\reports\e6b81fc2-947a-4570-aebb-50dbc27d0cf3.dmp Uploading c:\b\s\w\itinbguh\tmpxs_0to\reports\e6b81fc2-947a-4570-aebb-50dbc27d0cf3.dmp to gs://chrome-telemetry-output/minidump-2017-09-13_20-49-11-484977.dmp Please try to find it there. Thanks.
,
Sep 18 2017
Thanks a lot, Ken. Based on the minidump I have a theory of what has happened before the crash. The minidump shows that the crash happens on page flag check for a dead heap object 0x2ec84101. Memory region around the slot that contains the bogus pointer: 3fffdee0: 271cc669 <= map? 3fffdee4: 2428412d <= empty fixed array? (based on the 0x412d offset) 3fffdee8: 2ec84101 <= dead object 3fffdeec: 00024000 <= tagged integer with value 0x12000. 3fffdef0: 271cc669 <= next object map? 3fffdef4: 2428412d <= empty fixed array? 3fffdef8: 29f84101 3fffdefc: 00018000 <= tagged integer with value 0x0C000. So the object containing the bogus pointer is 4 words large and contains a pointer to the empty fixed array and an integer. I run the webgl test with logging of all objects that match this criteria. This found a JsArray object (my build is 64-bit and the minidump is 32-bit so offsets do not match exactly): 0x00001368eff02f11 <= JsArray map 0x0000129551a02251 <= empty fixed array 0x000017621b182201 <= pointer to the array backing store. 0x0001200000000000 <= length of the JsArray. 0x1368eff02f11: [Map] - type: JS_ARRAY_TYPE - instance size: 32 - inobject properties: 0 - elements kind: PACKED_DOUBLE_ELEMENTS The backing store contains doubles so its size is 0x12000 * 8 = 589824, which is larger than 512K (the old space page size). So the backing store is in the large object space. The theory: 1) JsArray is created and then promoted to the old space. 2) The array grows and a new backing store is allocated in the new space. This records old-to-new slot in the remembered set. 3) The array grows again. This time the backing store is allocated in the large object space. 4) Mark-compact GC runs: a) The array backing store is dead (this means that the array itself must be dead) b) The large page containing the backing store is unmapped. c) During evacuation, iteration of the old-to-new remembered set crashes while trying to check page flags of the unmapped page. Now the question is how 4.b could happen as we are careful to unmap pages only after evacuation is done. For some reason dead large object pages are enqueued for unmapping before evacuation (https://cs.chromium.org/chromium/src/v8/src/heap/mark-compact.cc?rcl=ec37390b2ba2b4051f46f153a8cc179ed4656f5d&l=4594). That should be OK because the unmapper task starts after evacuation (https://cs.chromium.org/chromium/src/v8/src/heap/mark-compact.cc?rcl=ec37390b2ba2b4051f46f153a8cc179ed4656f5d&l=3889). However, MarkCompactCollector::EnsureSweepingCompleted can also start unmapper task (https://cs.chromium.org/chromium/src/v8/src/heap/mark-compact.cc?rcl=ec37390b2ba2b4051f46f153a8cc179ed4656f5d&l=764). EnsureSweepingCompleted can be called during slow allocation, which can happen during evacuation. I think the bug is in enqueuing large object pages for unmapping before evacuation. We should do that after evacuation.
,
Sep 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/75877ddb7b3d5b135d5a1ae565e4f1df8b458175 commit 75877ddb7b3d5b135d5a1ae565e4f1df8b458175 Author: Ulan Degenbaev <ulan@chromium.org> Date: Mon Sep 18 09:39:16 2017 [heap] Do not unmap large pages before evacuation. See https://bugs.chromium.org/p/chromium/issues/detail?id=762677#c12 for the description of the bug. Bug: chromium:762677 TBR: mlippautz@chromium.org Change-Id: If5c4c2c15f2403d336edf34d10679521397db75c Reviewed-on: https://chromium-review.googlesource.com/670823 Commit-Queue: Ulan Degenbaev <ulan@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Cr-Commit-Position: refs/heads/master@{#48061} [modify] https://crrev.com/75877ddb7b3d5b135d5a1ae565e4f1df8b458175/src/heap/mark-compact.cc
,
Sep 19 2017
Awesome analysis Ulan!!! Thanks for getting to the bottom of this!
,
Sep 25 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by ynovikov@chromium.org
, Sep 6 2017