Issue metadata
Sign in to add a comment
|
Chrome: Crash Report - SkImageInfoValidConversion |
||||||||||||||||||||||
Issue descriptionProduct name: Chrome Magic Signature: SkImageInfoValidConversion Current link: https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.ptype%3D'renderer'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'SkImageInfoValidConversion'%20AND%20ReportID%3D'c6b9439e40000000'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 Search properties: custom_data.chromecrashproto.ptype: renderer custom_data.chromecrashproto.magic_signature_1.name: SkImageInfoValidConversion reportid: c6b9439e40000000 Metadata : Product Name: Chrome Product Version: 59.0.3071.115 Report ID: c6b9439e40000000 Report Time: Mon, 26 Jun 2017 19:15:37 GMT Uptime: 673000 ms Cumulative Uptime: 0 ms User Email: OS Name: Windows NT OS Version: 6.1.7601 23807 CPU Architecture: amd64 CPU Info: family 20 model 2 stepping 0 Stack Trace : Thread 8 (id: 3492) CRASHED [EXCEPTION_ACCESS_VIOLATION_READ @ 0xffffffffffffffff ] MAGIC SIGNATURE THREAD Stack Quality67%Show frame trust levels 0x000007fee7d17b59 (chrome_child.dll -skimageinfopriv.h:68 ) SkImageInfoValidConversion 0x0000002f 0x000007fee7fd317f (chrome_child.dll -skrect.h:821 ) SkRect::round(SkIRect *) 0x000007fee79b1302 (chrome_child.dll -skrasterclip.cpp:390 ) SkRasterClip::op(SkRect const &,SkMatrix const &,SkIRect const &,SkRegion::Op,bool) 0x000007fee797fc60 (chrome_child.dll -skbitmapdevice.cpp:475 ) SkBitmapDevice::onClipRect(SkRect const &,SkClipOp,bool) 0x000007fee797fb9e (chrome_child.dll -skcanvas.cpp:1476 ) SkCanvas::onClipRect(SkRect const &,SkClipOp,SkCanvas::ClipEdgeStyle) 0x000007fee797a37a (chrome_child.dll -raster_source.cc:80 ) cc::RasterSource::PlaybackToCanvas(SkCanvas *,gfx::ColorSpace const &,gfx::Rect const &,gfx::Rect const &,gfx::AxisTransform2d const &,cc::RasterSource::PlaybackSettings const &) 0x000007fee797a9ad (chrome_child.dll -raster_buffer_provider.cc:85 ) cc::RasterBufferProvider::PlaybackToMemory(void *,cc::ResourceFormat,gfx::Size const &,unsigned __int64,cc::RasterSource const *,gfx::Rect const &,gfx::Rect const &,gfx::AxisTransform2d const &,gfx::ColorSpace const &,cc::RasterSource::PlaybackSettings const &) 0x000007fee7be2ae6 (chrome_child.dll -bitmap_raster_buffer_provider.cc:54 ) cc::`anonymous namespace'::RasterBufferImpl::Playback 0x000007fee7be2991 (chrome_child.dll -tile_manager.cc:130 ) cc::`anonymous namespace'::RasterTaskImpl::RunOnWorkerThread 0x000007fee7be16a5 (chrome_child.dll -categorized_worker_pool.cc:362 ) content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired(cc::TaskCategory) 0x000007fee7e1a0b3 (chrome_child.dll -categorized_worker_pool.cc:35 ) content::`anonymous namespace'::CategorizedWorkerPoolThread::Run 0x000007fee7e1a024 (chrome_child.dll -simple_thread.cc:68 ) base::SimpleThread::ThreadMain() 0x000007fee7e1a6b3 (chrome_child.dll -platform_thread_win.cc:89 ) base::`anonymous namespace'::ThreadFunc 0x776159cc (kernel32.dll + 0x000159cc ) 0x7784a560 (ntdll.dll + 0x0002a560 ) 0x7769baaf (kernel32.dll + 0x0009baaf ) 0x7769baaf (kernel32.dll + 0x0009baaf ) This crash started on latest Chrome Stable i.e., 59.0.3071.115 there was single crash on 58.0.3029.110, Please find details below : Chrome version and number of crashes : 59.0.3071.115 281.82% 62 58.0.3029.110 4.55% 1 Total: 286.36% 22 Link to above data : https://goto.google.com/wwzso Note : None of the CL's related to Skia were merged between 59.0.3071.109 --> 59.0.3071.115, Please find the change log below : https://chromium.googlesource.com/chromium/src/+log/59.0.3071.109..59.0.3071.115?pretty=fuller&n=10000 Also observed that majority if the crashes are from "family 20" CPU.
,
Jun 26 2017
Looks like all the 59.0.3071.115 are on amd64/family 20 model 1/2 stepping 0, which is probably https://bugs.chromium.org/p/chromium/issues/detail?id=657233. It also matches that there's a large unexplained spike, for one particular version. It'll probably go away (or show up differently) in future builds. It would be good to filter crash queries to ignore anything from those CPUs. (the crash in 58.0.3029.110 is different and could get its its own bug if needed)
,
Jun 26 2017
removing RB flag
,
Jul 3 2017
,
Jul 5 2017
+brucedawson, FYI
,
Jul 5 2017
mmandlis: Could we tag bugs w/ this CPU arch/info with a warning to indicate that they are less actionable?
,
Jul 10 2017
(Stability sheriff triage per Stability-Sheriff-Dekstop tag) Just to summarize, the belief is this is the AMD CPU bug mentioned here" https://bugzilla.mozilla.org/show_bug.cgi?id=772330#c55 "Under a highly specific and detailed set of internal timing conditions, the processor may incorrectly update the branch status when a taken branch occurs where the first or second instruction after the branch is an indirect call or jump. " However, we still see 203838 crashes with this signature on current Windows stable 59.0.3071.115, which has been release for 14 days. That's a lot of crashes! (Especially since we only see crashes from opted in users.) In terms of users, it's 79335 unique users according to crash frontend (https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20product.version%3D%2759.0.3071.115%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D). It seems high enough that we should consider seeing if we could add some kind of workaround to avoid this. Since these are real users having their tabs crash because of this. Assigning back to y fmalita@ per the above.
,
Jul 10 2017
Verified that 100% of the 59.0.3071.115 crashes are still on amd64 family 20 model 1/2. There is no viable workaround that I know of. This crash shows up in various places, depending on the build. Looking at crash data, this particular instance only shows up in 59.0.3071.115: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27SkImageInfoValidConversion%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D So technically it is "fixed" already -- just rev the stable version! (I also suspect the crash is present in most/all builds, in some form or another, but it is had to identify due to its varying stack trace. The only things I can think of are a) label crashes from these CPUs as unreliable, as suggested in c#6, and b) don't push such builds to stable.)
,
Jul 10 2017
Thanks for looking more into this. Do we know if the generated machine code for SkImageInfoValidConversion() changed between current stable 59.0.3071.115 and previous stable 59.0.3071.109? Also, it seems 59.0.3071.109 was never rolled out beyond 5% stable... so maybe that contributes to us not seeing it on that version? I'm bringing this up because it's not obvious to me that "So technically it is "fixed" already -- just rev the stable version!" - i.e. I would highly doubt that just changing version would be sufficient. I understand that maybe if enough other things change in the binary and or our compiler version, we could move this crash around - or potentially it could disappear entirely with certain version (i.e. if no machine code tickles the specific AMD bug).
,
Jul 10 2017
I haven't looked at the generated code, but with PGO it wouldn't be surprising to change from rev to rev (brucedawson is more qualified to shed some light there). It's also unclear whether the trigger is necessarily in the leaf code or somewhere up the call stack. Historically, we've seen these crashes pop up semi-randomly in unchanged code, and then subsequently disappear (see issue 657233, for example). This crash doesn't show up in any of the beta/dev/canary builds, so I'd say there's a good chance it will follow the same pattern and mysteriously disappear in the next M59 build. It's also possible that an M59 respin will not shuffle things around sufficiently, but sadly I don't have any better suggestion. Here's a crude query showing crashes which occur > 90% time on buggy CPUs, presumably related: https://crash.corp.google.com/dremel_query_ui?q=SELECT%20custom_data.ChromeCrashProto.magic_signature_1.name%20AS%20sig%2C%20product.version%20AS%20ver%2C%20COUNT(*)%20AS%20total%2C%20SUM(CASE%20WHEN%20cpu.Info%20%3D%20%27family%2020%20model%201%20stepping%200%27%20THEN%201%20WHEN%20cpu.Info%20%3D%20%27family%2020%20model%202%20stepping%200%27%20THEN%201%20ELSE%200%20END)%20AS%20badcpus%20FROM%20crash.prod.latest%20WHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20GROUP%20BY%20sig%2Cver%20HAVING%20badcpus%20%3E%2020%20AND%20(badcpus%20%2F%20total)%20%3E%200.9%20ORDER%20BY%20badcpus%20DESC%3B Looks like Skia really hit the jackpot in 59.0.3071.115, but maybe the large number of crashes for that rev has to do with it being a stable release more than anything else. There are other crashes in there, but the silver lining is it doesn't seem to affect quite as many builds as I would have guessed.
,
Jul 10 2017
There really isn't enough information in the erratum to guess what sort of code rearrangements could affect the bug. However if it truly is timing related then the crashing code itself might not even need to change - it could be enough to change its location relative to cache-line boundaries or its location relative to other pieces of code (since cache lines and branch predictors have a finite number of sets and therefore alias with each other which therefore randomizes timing). So maybe the crash does go away or become far less likely in some builds. That's possible. Or it could move to a different crash signature - I dealt with a lot of changing signatures when looking at VS 2017 builds. That's possible. Either way I'm not sure what we *do* about it. Warn users if they have a vulnerable CPU that they haven't applied the patch to?
,
Jul 13 2017
,
Aug 29 2017
No private stacks in here. No need for RVG.
,
Oct 25 2017
,
Nov 28 2017
Closing out as the issue did seem to spike then resolve, and was related to a specific CPU vs something that could be (reasonably) fixed in our code. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by robertphillips@chromium.org
, Jun 26 2017