New issue
Advanced search Search tips

Issue 836231 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Nov 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

ARM icache fail to flush and V8 crash on JIT code

Reported by manjian2...@gmail.com, Apr 24 2018

Issue description

Steps to reproduce the problem:
Collected from the users, no idea how to reproduce.

What is the expected behavior?

What went wrong?
The ARM icache fail to flush by the syscall.

Did this work before? N/A 

Chrome version: 66.0.3359.117  Channel: stable
OS Version: 7.0
Flash Version: 

The crash can occur on any JIT code. Including interpreter bytecode handlers, optimized functions, builtins like ObjectPrototypeToString.
 
image.png
44.1 KB View Download
The image above show the CPUs involved in a released app.
A typical crash on a optimized function show as below. Another astonishing fact is that even a code vma has been unmap from the address space, and a new code vma is mapped to the same virtual address, the icache remain unflushed.
F6D0071C-46AC-402f-AF14-8D317D6EC4A2.png
75.6 KB View Download
Components: Blink>JavaScript
Labels: -Arch-x86_64 Arch-ARM
@whoever reads the summary: This is a crash happening on UCBrowser.

manjian2006@gmail.com: What version of Chromium is the crashing UCBrowser version based on? Do you also have pageloads for the chipsets that you provided in comment #3? It would be interesting to see if the higher number on certain boards is to higher pageload or higher crashers.
I add the observation here. The pageload information Should be listed in every crash report. But I have to ask my colleagues to sum it up.
Sorry. No pageload information according to one of our crash report.
Cc: jkummerow@chromium.org
(1) Why do you believe that this is related to icache flushing? Do you have any evidence pointing in that direction?

(2) You wrote "The ARM icache fail to flush by the syscall." If a cache flushing syscall did happen but was ineffective, doesn't that make it a kernel issue (rather than something V8 can fix)? Or are you saying that V8 failed to make the syscall when it would have been required?

(3) What OS/kernel was running on the affected machines?

(4) (Repeating #3's question:) Which version of V8 is the browser based on that's reporting these crashes?
Question 1-3 needs a judgement after reading and thinking the data present by crash reports.
I attach several reports in the attachment.
The pc presents the raw code data, need to disassemble by your own.

The browser is based on 57, but v8 is based on 63.
crashreports.7z
154 KB Download
It keeps poping up, even with 30000 active users.
The resort to activate the kernel to call update_mmu_cache turns out to be equivalent with the original resort.
I have no idea why switching the context works but only activating the call to update_mmu_cache does not.

Next step will be calling nanosleep after each flush.

one more.PNG
233 KB View Download
The final mitigate solution is to disable the code compaction.
The crash rate drop several times, and we are happy with that result.
What was your observation on your benchmarks?
As far as my understanding, this change will accelerate the evacuation, but enlarge the space consumed by the code.
Would it be possible to upload a CL, so we can evaluate if we can upstream your change?
diff --git a/src/flag-definitions.h b/src/flag-definitions.h
index 25cc96a00f..ef44fa13cd 100644
--- a/src/flag-definitions.h
+++ b/src/flag-definitions.h
@@ -663,7 +663,11 @@ DEFINE_INT(v8_os_page_size, 0, "override OS page size (in KBytes)")
 DEFINE_BOOL(always_compact, false, "Perform compaction on every full GC")
 DEFINE_BOOL(never_compact, false,
             "Never perform compaction on full GC - testing only")
+#if defined(UC_BUILD)
+DEFINE_BOOL(compact_code_space, false, "Compact code space on full collections")
+#else
 DEFINE_BOOL(compact_code_space, true, "Compact code space on full collections")
+#endif
 DEFINE_BOOL(use_marking_progress_bar, true,
             "Use a progress bar to scan large objects in increments when "
             "incremental marking is active.")


This is a code evacuation disable patch. I don't think you what to make that upstream. 
I was fooled by the data presented. The anon:v8 crash rate seems lower only in the stupid UI design.
So let's work on a new mitigate resort.
So far, only increasing the interrupt budget from 144KB to 144MB could decrease the crash rate effectively. But this resort is not acceptable.
All ineffective attempts omitted an important fact: these crashes happened on ARM64 architecture.
That made update_mmu_cache to flush all icache not working at all, for the ARM64 implementation was only to call isb.

I will look into why 64bit flush icache is not working. I have tried to flush the icache by page boundary, to avoid the page fault make the flush exit early.
Not working.
http://www.mono-project.com/news/2016/09/12/arm64-icache/

DAMN it. The mono project has find the cause. The honor is theirs.
I suggest at least the arm64 version should adopt the cache flushing resort from the mono project.

	/* Don't rely on GCC's __clear_cache implementation, as it caches
	 * icache/dcache cache line sizes, that can vary between cores on
	 * big.LITTLE architectures. */
	guint64 end = (guint64) (code + size);
	guint64 addr;
	/* always go with cacheline size of 4 bytes as this code isn't perf critical
	 * anyway. Reading the cache line size from a machine register can be racy
	 * on a big.LITTLE architecture if the cores don't have the same cache line
	 * sizes. */
	const size_t icache_line_size = 4;
	const size_t dcache_line_size = 4;

	addr = (guint64) code & ~(guint64) (dcache_line_size - 1);
	for (; addr < end; addr += dcache_line_size)
		asm volatile("dc civac, %0" : : "r" (addr) : "memory");
	asm volatile("dsb ish" : : : "memory");

	addr = (guint64) code & ~(guint64) (icache_line_size - 1);
	for (; addr < end; addr += icache_line_size)
		asm volatile("ic ivau, %0" : : "r" (addr) : "memory");

	asm volatile ("dsb ish" : : : "memory");
	asm volatile ("isb" : : : "memory");

Here we go, new way to flush icache for test.

diff --git a/src/arm/cpu-arm.cc b/src/arm/cpu-arm.cc
index f5d2ab19d1..532a387491 100644
--- a/src/arm/cpu-arm.cc
+++ b/src/arm/cpu-arm.cc
@@ -20,10 +20,20 @@
 namespace v8 {
 namespace internal {
 
+#if defined(UC_BUILD)
+static void BigBigWorld() {
+  asm volatile(".rept 128 * 1024\n"
+  "nop\n"
+  ".endr\n");
+}
+#endif
+
 void CpuFeatures::FlushICache(void* start, size_t size) {
 #if !defined(USE_SIMULATOR)
 #if V8_OS_QNX
   msync(start, size, MS_SYNC | MS_INVALIDATE_ICACHE);
+#elif defined(UC_BUILD)
+  BigBigWorld();
 #else
   register uint32_t beg asm("r0") = reinterpret_cast<uint32_t>(start);
   register uint32_t end asm("r1") = beg + size;

This attempt is to fill icache with nops.
The bug linked in #17 is already addressed in V8, the cache line size is not cached and read again for every flush. The cache flushing code hasn't changed for a while, can you provide more details on why you believe this is an issue with arm64 cache maintenance?

The screenshots (#2 and #8) are arm32, so is the patch in #19. Does this issue also appear on arm32 then?
Read every time is not a good enough solution. For preemption will break it.

arm32 also has this problem, but I have no idea how to fix yet. Just address the arm64 problem first.
The top 3 CPUs listed in #1 (MSM8976, MT6797, Kirin 950) are big.LITTLE CPUs using Cortex-A53 and Cortex-A72. Both of these cores have the same cache line size (64 bytes), they are not affected by the bug mentioned in #17.

The issue may still be with cache management but this is not related to the cache line size on big.LITTLE.

This report indicates a boundary. The instruction at f4 is not executed expectedly but f8 does!
5C15FF3E-8DB3-4365-8CE4-0B1673E6D89A.png
102 KB View Download
There are no cache line boundary between f4 and f8. The cache line ends in fc.

I agree it does feel like a cache maintenance issue, did you change the V8 code in any way?

32-bit cache maintenance is done in the kernel and the same routine is used by the Chrome browser so it would be surprising if you were to see many crashes due to kernel routine and they did not.

I mean it looks like a boundary. No I do not change any icache related code.
I think chrome doesn't collect the proper data so they doesn't recognize the data. We have every page v8 allocated identified.
Could  it be the old kernel use "dc cvau" instead of "dc icvau" the cause of This problem?
Let's start by the beginning, can you explain in details why do you believe cache maintenance is broken?
The crash reports shown above lead me to believe that way.
The patch attached uses a signal trick to make an arm32 thread to arm64, flush the cache then transfer back(This trick does not work on Linux kernel 4, which reset the compat flag in valid_compat_regs).

And I have observed several cell phone failed to make the cache clean.
The flushing code is basically the same with the one inside Linux kernel.
 
escaped_to_64.patch
14.8 KB Download
I had a first look at the patch and if I understand correctly you allocate an executable page and then write into it the arm64 assembly routine to do cache management (PrepareCode(uint32_t*, uint32_t*)). But I do not see where you are syncing the caches after writing this routine. Am I missing something?
code flush routine is in cpu-arm.cc.CodeFlushSwitcher.
What do you mean sync the cache. Isn't that all the kernel doing?
Do you mean I have to flush that arm64 page too?I use atomic_thread_barrier which generates "dmb sy" by gcc.
In 32-bit only the kernel can do cache maintenance but it is not automatic, you have to do a syscall to request the operation. In your patch you generate the arm64 code so I would expect to see a syscall afterward to sync the caches. I don't see it.
Sorry I hadn't seen your reply (#33). dmb is a memory barrier it does not guarantee the caches are in sync. 
Yeah. But It turns out work well for a newly allocated page.
You can try it out. It doesn't matter.
Writing to a newly allocated page doesn't change the fact you need to do cache management, otherwise you will get random crashes (sometime it will work as expected sometime it won't).
Okay. Let me try it out.
I have observed many crash in 2-3 seconds, which means the initial flushing failure. The crash code come from the snapshot.Like the interpreter entry. That mean even I flush these code, I have no idea if that works on my code.
I add the flush code to PrepareCode and SignalHandler generation:
  register uint32_t beg asm("r0") = reinterpret_cast<uint32_t>(code32);
  register uint32_t end asm("r1") = beg + 4096;
  register uint32_t flg asm("r2") = 0;

  asm volatile(
    // This assembly works for both ARM and Thumb targets.

    // Preserve r7; it is callee-saved, and GCC uses it as a frame pointer for
    // Thumb targets.
    "  push {r7}\n"
                                  // r0 = beg
                                  // r1 = end
                                  // r2 = flags (0)
    "  ldr r7, =%c[scno]\n"       // r7 = syscall number
    "  svc 0\n"

    "  pop {r7}\n"
    :
    : "r" (beg), "r" (end), "r" (flg), [scno] "i" (__ARM_NR_cacheflush)
    : "memory");

It generates the code:
  4caacc:	2600      	movs	r6, #0
  ...
  4cab64:	4620      	mov	r0, r4
  ...
  4cab8c:	f504 5180 	add.w	r1, r4, #4096	; 0x1000
  ...
  4cab94:	4632      	mov	r2, r6
  4cab96:	b480      	push	{r7}
  4cab98:	4f06      	ldr	r7, [pc, #24]	; (4cabb4 <v8::internal::AArch64Switcher::PrepareCode()+0xf0>)
  4cab9a:	df00      	svc	0
  4cab9c:	bc80      	pop	{r7}
  ...
  4cabb4:	000f0002 	.word	0x000f0002

And still crash on those cells.
OK, so if you believe cache maintenance is the issue did you write a dedicated test? Something like:
 * generate function (say movw r0, #imm; bx lr)
 * do cache maintenance
 * run function
 * check results
 * repeat many many time, changing the imm value each time
If you have a crash or wrong result then cache maintenance could be an issue and I can use you test to investigate.

Can you give it a try?
That way can never reproduce this crash.
This problem happens like quantum theory. It occurs in a probability. In all the crashes, it takes 5%.
When I disable the turbofan optimizer, it just take 0.00x%.
We can only observe it by the statistics.

The only thing I am sure is that from the reports, especially the ones I post here, do look like a cache maintenance problem.
The scenario f4 not being executed but f8 and fc being executed, from my experiences the only explanation is the cache maintenance problem. 
From May 15th to May 30th, this crashes on Linux kernel 3 100,170 times but 261 times on Linux kernel 4. Seems the source has no significant change. Only Privileged Access Never has been added, and use dc icvac.
The linux kernel 4 contributes 1/8 crash, but on this crash only contributes little.
Ok I think I understand better.

For the cache issue in 32-bit, do you have the exact Android version? You are right, the cache routines were updated in the 4.9 kernel but they could have been back ported by the android kernel team. If you are testing on stock Linux then yes you'd need version 4.9 to get the fix.

Do you see the same issue on an arm64 browser? arm64 cache maintenance is done in user space and V8 has the correct sequence, so if you still see the issue then it isn't a problem with cache maintenance.

Your patch (escape_to_64) intrigues me: If I understand correctly you are switching to arm64 to do cache maintenance because you believe the 32-bit cache maintenance is broken. As mentioned in previous replies, this cannot work as you generate the arm64 code so you would need to call 32-bit cache maintenance first and if it is broken as you believe then you cannot expect your arm64 code to run well.

finally in #7 you say the browser is based on chrome 57 but V8 is based on 6.3. I believe V8 and chrome are usually designed to work in tandem (so chrome 57 with V8 5.7) and this could be another source of your problem: maybe a cache maintenance call is missing due to the version mismatch.

The version mismatching is not the problem. Old 5.7 v8 uses more native code so it has more severe problem. The cache maintenance is builtin v8. If that's a problem, what about node.js.
Broken icache sync routine is not the only possible explanation, if the code is missing calls to the cache sync routine, the symptoms will be the same. What I was suggesting is that the mismatch may have required some code changes and those may have resulted in a missing call to cache management. but that is just speculation.

Can you run your browser in arm64? do you still see the issue then?
Our products require release only  on the arm32 platform.

rodolph.… via monorail <monorail+v2.2990728046@chromium.org> 于 2018年6月8日周五
上午10:47写道:
I have worked out a patch that the arm64 code lays in the arm32 elf. So the code management problem is solved. Also I remove the dependency to the signal. 
This patch shows what I have done
second .patch
14.6 KB Download
Rodolph, anything actionable from your side here?
I am trying a new flush resort in alipay, looking foward to ease the problem.
Nothing on my side. The patch attached is a work around for old kernel with a bug in their cache maintenance operations. The patch will only work on old kernel anyway, recent ones prevent the 32-bit/64-bit switch in interrupts.

AFAICT there are no issues in V8. This can be closed.

Status: WontFix (was: Unconfirmed)
Thank you, Rodolph!

Sign in to add a comment