ARM icache fail to flush and V8 crash on JIT code
Reported by
manjian2...@gmail.com,
Apr 24 2018
|
||||
Issue descriptionSteps to reproduce the problem: Collected from the users, no idea how to reproduce. What is the expected behavior? What went wrong? The ARM icache fail to flush by the syscall. Did this work before? N/A Chrome version: 66.0.3359.117 Channel: stable OS Version: 7.0 Flash Version: The crash can occur on any JIT code. Including interpreter bytecode handlers, optimized functions, builtins like ObjectPrototypeToString.
,
Apr 24 2018
A typical crash on a optimized function show as below. Another astonishing fact is that even a code vma has been unmap from the address space, and a new code vma is mapped to the same virtual address, the icache remain unflushed.
,
Apr 24 2018
@whoever reads the summary: This is a crash happening on UCBrowser. manjian2006@gmail.com: What version of Chromium is the crashing UCBrowser version based on? Do you also have pageloads for the chipsets that you provided in comment #3? It would be interesting to see if the higher number on certain boards is to higher pageload or higher crashers.
,
Apr 24 2018
I add the observation here. The pageload information Should be listed in every crash report. But I have to ask my colleagues to sum it up.
,
Apr 24 2018
Sorry. No pageload information according to one of our crash report.
,
Apr 27 2018
(1) Why do you believe that this is related to icache flushing? Do you have any evidence pointing in that direction? (2) You wrote "The ARM icache fail to flush by the syscall." If a cache flushing syscall did happen but was ineffective, doesn't that make it a kernel issue (rather than something V8 can fix)? Or are you saying that V8 failed to make the syscall when it would have been required? (3) What OS/kernel was running on the affected machines? (4) (Repeating #3's question:) Which version of V8 is the browser based on that's reporting these crashes?
,
Apr 28 2018
Question 1-3 needs a judgement after reading and thinking the data present by crash reports. I attach several reports in the attachment. The pc presents the raw code data, need to disassemble by your own. The browser is based on 57, but v8 is based on 63.
,
May 4 2018
It keeps poping up, even with 30000 active users. The resort to activate the kernel to call update_mmu_cache turns out to be equivalent with the original resort. I have no idea why switching the context works but only activating the call to update_mmu_cache does not. Next step will be calling nanosleep after each flush.
,
May 22 2018
The final mitigate solution is to disable the code compaction. The crash rate drop several times, and we are happy with that result.
,
May 23 2018
What was your observation on your benchmarks?
,
May 23 2018
As far as my understanding, this change will accelerate the evacuation, but enlarge the space consumed by the code.
,
May 25 2018
Would it be possible to upload a CL, so we can evaluate if we can upstream your change?
,
May 26 2018
diff --git a/src/flag-definitions.h b/src/flag-definitions.h
index 25cc96a00f..ef44fa13cd 100644
--- a/src/flag-definitions.h
+++ b/src/flag-definitions.h
@@ -663,7 +663,11 @@ DEFINE_INT(v8_os_page_size, 0, "override OS page size (in KBytes)")
DEFINE_BOOL(always_compact, false, "Perform compaction on every full GC")
DEFINE_BOOL(never_compact, false,
"Never perform compaction on full GC - testing only")
+#if defined(UC_BUILD)
+DEFINE_BOOL(compact_code_space, false, "Compact code space on full collections")
+#else
DEFINE_BOOL(compact_code_space, true, "Compact code space on full collections")
+#endif
DEFINE_BOOL(use_marking_progress_bar, true,
"Use a progress bar to scan large objects in increments when "
"incremental marking is active.")
This is a code evacuation disable patch. I don't think you what to make that upstream.
,
May 28 2018
I was fooled by the data presented. The anon:v8 crash rate seems lower only in the stupid UI design. So let's work on a new mitigate resort.
,
May 28 2018
So far, only increasing the interrupt budget from 144KB to 144MB could decrease the crash rate effectively. But this resort is not acceptable.
,
May 29 2018
All ineffective attempts omitted an important fact: these crashes happened on ARM64 architecture. That made update_mmu_cache to flush all icache not working at all, for the ARM64 implementation was only to call isb. I will look into why 64bit flush icache is not working. I have tried to flush the icache by page boundary, to avoid the page fault make the flush exit early. Not working.
,
May 29 2018
http://www.mono-project.com/news/2016/09/12/arm64-icache/ DAMN it. The mono project has find the cause. The honor is theirs.
,
May 29 2018
I suggest at least the arm64 version should adopt the cache flushing resort from the mono project.
/* Don't rely on GCC's __clear_cache implementation, as it caches
* icache/dcache cache line sizes, that can vary between cores on
* big.LITTLE architectures. */
guint64 end = (guint64) (code + size);
guint64 addr;
/* always go with cacheline size of 4 bytes as this code isn't perf critical
* anyway. Reading the cache line size from a machine register can be racy
* on a big.LITTLE architecture if the cores don't have the same cache line
* sizes. */
const size_t icache_line_size = 4;
const size_t dcache_line_size = 4;
addr = (guint64) code & ~(guint64) (dcache_line_size - 1);
for (; addr < end; addr += dcache_line_size)
asm volatile("dc civac, %0" : : "r" (addr) : "memory");
asm volatile("dsb ish" : : : "memory");
addr = (guint64) code & ~(guint64) (icache_line_size - 1);
for (; addr < end; addr += icache_line_size)
asm volatile("ic ivau, %0" : : "r" (addr) : "memory");
asm volatile ("dsb ish" : : : "memory");
asm volatile ("isb" : : : "memory");
,
May 29 2018
Here we go, new way to flush icache for test.
diff --git a/src/arm/cpu-arm.cc b/src/arm/cpu-arm.cc
index f5d2ab19d1..532a387491 100644
--- a/src/arm/cpu-arm.cc
+++ b/src/arm/cpu-arm.cc
@@ -20,10 +20,20 @@
namespace v8 {
namespace internal {
+#if defined(UC_BUILD)
+static void BigBigWorld() {
+ asm volatile(".rept 128 * 1024\n"
+ "nop\n"
+ ".endr\n");
+}
+#endif
+
void CpuFeatures::FlushICache(void* start, size_t size) {
#if !defined(USE_SIMULATOR)
#if V8_OS_QNX
msync(start, size, MS_SYNC | MS_INVALIDATE_ICACHE);
+#elif defined(UC_BUILD)
+ BigBigWorld();
#else
register uint32_t beg asm("r0") = reinterpret_cast<uint32_t>(start);
register uint32_t end asm("r1") = beg + size;
This attempt is to fill icache with nops.
,
Jun 4 2018
The bug linked in #17 is already addressed in V8, the cache line size is not cached and read again for every flush. The cache flushing code hasn't changed for a while, can you provide more details on why you believe this is an issue with arm64 cache maintenance? The screenshots (#2 and #8) are arm32, so is the patch in #19. Does this issue also appear on arm32 then?
,
Jun 4 2018
Read every time is not a good enough solution. For preemption will break it. arm32 also has this problem, but I have no idea how to fix yet. Just address the arm64 problem first.
,
Jun 4 2018
The top 3 CPUs listed in #1 (MSM8976, MT6797, Kirin 950) are big.LITTLE CPUs using Cortex-A53 and Cortex-A72. Both of these cores have the same cache line size (64 bytes), they are not affected by the bug mentioned in #17. The issue may still be with cache management but this is not related to the cache line size on big.LITTLE.
,
Jun 4 2018
This report indicates a boundary. The instruction at f4 is not executed expectedly but f8 does!
,
Jun 5 2018
There are no cache line boundary between f4 and f8. The cache line ends in fc. I agree it does feel like a cache maintenance issue, did you change the V8 code in any way? 32-bit cache maintenance is done in the kernel and the same routine is used by the Chrome browser so it would be surprising if you were to see many crashes due to kernel routine and they did not.
,
Jun 5 2018
I mean it looks like a boundary. No I do not change any icache related code.
,
Jun 5 2018
I think chrome doesn't collect the proper data so they doesn't recognize the data. We have every page v8 allocated identified.
,
Jun 5 2018
Could it be the old kernel use "dc cvau" instead of "dc icvau" the cause of This problem?
,
Jun 6 2018
Let's start by the beginning, can you explain in details why do you believe cache maintenance is broken?
,
Jun 6 2018
The crash reports shown above lead me to believe that way.
,
Jun 6 2018
The patch attached uses a signal trick to make an arm32 thread to arm64, flush the cache then transfer back(This trick does not work on Linux kernel 4, which reset the compat flag in valid_compat_regs). And I have observed several cell phone failed to make the cache clean. The flushing code is basically the same with the one inside Linux kernel.
,
Jun 6 2018
I had a first look at the patch and if I understand correctly you allocate an executable page and then write into it the arm64 assembly routine to do cache management (PrepareCode(uint32_t*, uint32_t*)). But I do not see where you are syncing the caches after writing this routine. Am I missing something?
,
Jun 6 2018
code flush routine is in cpu-arm.cc.CodeFlushSwitcher. What do you mean sync the cache. Isn't that all the kernel doing?
,
Jun 6 2018
Do you mean I have to flush that arm64 page too?I use atomic_thread_barrier which generates "dmb sy" by gcc.
,
Jun 7 2018
In 32-bit only the kernel can do cache maintenance but it is not automatic, you have to do a syscall to request the operation. In your patch you generate the arm64 code so I would expect to see a syscall afterward to sync the caches. I don't see it.
,
Jun 7 2018
Sorry I hadn't seen your reply (#33). dmb is a memory barrier it does not guarantee the caches are in sync.
,
Jun 7 2018
Yeah. But It turns out work well for a newly allocated page.
,
Jun 7 2018
You can try it out. It doesn't matter.
,
Jun 7 2018
Writing to a newly allocated page doesn't change the fact you need to do cache management, otherwise you will get random crashes (sometime it will work as expected sometime it won't).
,
Jun 7 2018
Okay. Let me try it out.
,
Jun 7 2018
I have observed many crash in 2-3 seconds, which means the initial flushing failure. The crash code come from the snapshot.Like the interpreter entry. That mean even I flush these code, I have no idea if that works on my code.
,
Jun 7 2018
I add the flush code to PrepareCode and SignalHandler generation:
register uint32_t beg asm("r0") = reinterpret_cast<uint32_t>(code32);
register uint32_t end asm("r1") = beg + 4096;
register uint32_t flg asm("r2") = 0;
asm volatile(
// This assembly works for both ARM and Thumb targets.
// Preserve r7; it is callee-saved, and GCC uses it as a frame pointer for
// Thumb targets.
" push {r7}\n"
// r0 = beg
// r1 = end
// r2 = flags (0)
" ldr r7, =%c[scno]\n" // r7 = syscall number
" svc 0\n"
" pop {r7}\n"
:
: "r" (beg), "r" (end), "r" (flg), [scno] "i" (__ARM_NR_cacheflush)
: "memory");
It generates the code:
4caacc: 2600 movs r6, #0
...
4cab64: 4620 mov r0, r4
...
4cab8c: f504 5180 add.w r1, r4, #4096 ; 0x1000
...
4cab94: 4632 mov r2, r6
4cab96: b480 push {r7}
4cab98: 4f06 ldr r7, [pc, #24] ; (4cabb4 <v8::internal::AArch64Switcher::PrepareCode()+0xf0>)
4cab9a: df00 svc 0
4cab9c: bc80 pop {r7}
...
4cabb4: 000f0002 .word 0x000f0002
And still crash on those cells.
,
Jun 7 2018
OK, so if you believe cache maintenance is the issue did you write a dedicated test? Something like: * generate function (say movw r0, #imm; bx lr) * do cache maintenance * run function * check results * repeat many many time, changing the imm value each time If you have a crash or wrong result then cache maintenance could be an issue and I can use you test to investigate. Can you give it a try?
,
Jun 7 2018
That way can never reproduce this crash. This problem happens like quantum theory. It occurs in a probability. In all the crashes, it takes 5%. When I disable the turbofan optimizer, it just take 0.00x%. We can only observe it by the statistics.
,
Jun 7 2018
The only thing I am sure is that from the reports, especially the ones I post here, do look like a cache maintenance problem. The scenario f4 not being executed but f8 and fc being executed, from my experiences the only explanation is the cache maintenance problem.
,
Jun 7 2018
From May 15th to May 30th, this crashes on Linux kernel 3 100,170 times but 261 times on Linux kernel 4. Seems the source has no significant change. Only Privileged Access Never has been added, and use dc icvac.
,
Jun 7 2018
The linux kernel 4 contributes 1/8 crash, but on this crash only contributes little.
,
Jun 8 2018
Ok I think I understand better. For the cache issue in 32-bit, do you have the exact Android version? You are right, the cache routines were updated in the 4.9 kernel but they could have been back ported by the android kernel team. If you are testing on stock Linux then yes you'd need version 4.9 to get the fix. Do you see the same issue on an arm64 browser? arm64 cache maintenance is done in user space and V8 has the correct sequence, so if you still see the issue then it isn't a problem with cache maintenance. Your patch (escape_to_64) intrigues me: If I understand correctly you are switching to arm64 to do cache maintenance because you believe the 32-bit cache maintenance is broken. As mentioned in previous replies, this cannot work as you generate the arm64 code so you would need to call 32-bit cache maintenance first and if it is broken as you believe then you cannot expect your arm64 code to run well. finally in #7 you say the browser is based on chrome 57 but V8 is based on 6.3. I believe V8 and chrome are usually designed to work in tandem (so chrome 57 with V8 5.7) and this could be another source of your problem: maybe a cache maintenance call is missing due to the version mismatch.
,
Jun 8 2018
The version mismatching is not the problem. Old 5.7 v8 uses more native code so it has more severe problem. The cache maintenance is builtin v8. If that's a problem, what about node.js.
,
Jun 8 2018
Broken icache sync routine is not the only possible explanation, if the code is missing calls to the cache sync routine, the symptoms will be the same. What I was suggesting is that the mismatch may have required some code changes and those may have resulted in a missing call to cache management. but that is just speculation. Can you run your browser in arm64? do you still see the issue then?
,
Jun 8 2018
Our products require release only on the arm32 platform. rodolph.… via monorail <monorail+v2.2990728046@chromium.org> 于 2018年6月8日周五 上午10:47写道:
,
Jun 14 2018
I have worked out a patch that the arm64 code lays in the arm32 elf. So the code management problem is solved. Also I remove the dependency to the signal.
,
Jun 14 2018
This patch shows what I have done
,
Nov 22
Rodolph, anything actionable from your side here?
,
Nov 22
I am trying a new flush resort in alipay, looking foward to ease the problem.
,
Nov 27
Nothing on my side. The patch attached is a work around for old kernel with a bug in their cache maintenance operations. The patch will only work on old kernel anyway, recent ones prevent the 32-bit/64-bit switch in interrupts. AFAICT there are no issues in V8. This can be closed.
,
Nov 27
Thank you, Rodolph! |
||||
►
Sign in to add a comment |
||||
Comment 1 by manjian2...@gmail.com
, Apr 24 2018