New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 736885 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Chrome: Crash Report - SkImageInfoValidConversion

Project Member Reported by pbomm...@chromium.org, Jun 26 2017

Issue description

Product name: Chrome
Magic Signature: SkImageInfoValidConversion

Current link:
https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.ptype%3D'renderer'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'SkImageInfoValidConversion'%20AND%20ReportID%3D'c6b9439e40000000'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3


Search properties:
custom_data.chromecrashproto.ptype: renderer
custom_data.chromecrashproto.magic_signature_1.name: SkImageInfoValidConversion
reportid: c6b9439e40000000

Metadata :
Product Name: Chrome
Product Version: 59.0.3071.115
Report ID: c6b9439e40000000
Report Time: Mon, 26 Jun 2017 19:15:37 GMT
Uptime: 673000 ms
Cumulative Uptime: 0 ms
User Email: 
OS Name: Windows NT
OS Version: 6.1.7601 23807
CPU Architecture: amd64
CPU Info: family 20 model 2 stepping 0


Stack Trace :
Thread 8 (id: 3492) CRASHED [EXCEPTION_ACCESS_VIOLATION_READ @ 0xffffffffffffffff ] MAGIC SIGNATURE THREAD
Stack Quality67%Show frame trust levels
0x000007fee7d17b59	(chrome_child.dll -skimageinfopriv.h:68 )	SkImageInfoValidConversion
0x0000002f		
0x000007fee7fd317f	(chrome_child.dll -skrect.h:821 )	SkRect::round(SkIRect *)
0x000007fee79b1302	(chrome_child.dll -skrasterclip.cpp:390 )	SkRasterClip::op(SkRect const &,SkMatrix const &,SkIRect const &,SkRegion::Op,bool)
0x000007fee797fc60	(chrome_child.dll -skbitmapdevice.cpp:475 )	SkBitmapDevice::onClipRect(SkRect const &,SkClipOp,bool)
0x000007fee797fb9e	(chrome_child.dll -skcanvas.cpp:1476 )	SkCanvas::onClipRect(SkRect const &,SkClipOp,SkCanvas::ClipEdgeStyle)
0x000007fee797a37a	(chrome_child.dll -raster_source.cc:80 )	cc::RasterSource::PlaybackToCanvas(SkCanvas *,gfx::ColorSpace const &,gfx::Rect const &,gfx::Rect const &,gfx::AxisTransform2d const &,cc::RasterSource::PlaybackSettings const &)
0x000007fee797a9ad	(chrome_child.dll -raster_buffer_provider.cc:85 )	cc::RasterBufferProvider::PlaybackToMemory(void *,cc::ResourceFormat,gfx::Size const &,unsigned __int64,cc::RasterSource const *,gfx::Rect const &,gfx::Rect const &,gfx::AxisTransform2d const &,gfx::ColorSpace const &,cc::RasterSource::PlaybackSettings const &)
0x000007fee7be2ae6	(chrome_child.dll -bitmap_raster_buffer_provider.cc:54 )	cc::`anonymous namespace'::RasterBufferImpl::Playback
0x000007fee7be2991	(chrome_child.dll -tile_manager.cc:130 )	cc::`anonymous namespace'::RasterTaskImpl::RunOnWorkerThread
0x000007fee7be16a5	(chrome_child.dll -categorized_worker_pool.cc:362 )	content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired(cc::TaskCategory)
0x000007fee7e1a0b3	(chrome_child.dll -categorized_worker_pool.cc:35 )	content::`anonymous namespace'::CategorizedWorkerPoolThread::Run
0x000007fee7e1a024	(chrome_child.dll -simple_thread.cc:68 )	base::SimpleThread::ThreadMain()
0x000007fee7e1a6b3	(chrome_child.dll -platform_thread_win.cc:89 )	base::`anonymous namespace'::ThreadFunc
0x776159cc	(kernel32.dll + 0x000159cc )	
0x7784a560	(ntdll.dll + 0x0002a560 )	
0x7769baaf	(kernel32.dll + 0x0009baaf )	
0x7769baaf	(kernel32.dll + 0x0009baaf )	


This crash started on latest Chrome Stable i.e., 59.0.3071.115 there was single crash on 58.0.3029.110, Please find details below :


Chrome version and number of crashes :
59.0.3071.115	281.82%	62	
58.0.3029.110	4.55%	 1	
Total:	       286.36%	22


Link to above data : https://goto.google.com/wwzso 


Note : None of the CL's related to Skia were merged between 59.0.3071.109 --> 59.0.3071.115, Please find the change log below :

https://chromium.googlesource.com/chromium/src/+log/59.0.3071.109..59.0.3071.115?pretty=fuller&n=10000


Also observed that majority if the crashes are from "family 20" CPU.
	
	

 
Cc: fmalita@chromium.org
Looks like all the 59.0.3071.115 are on amd64/family 20 model 1/2 stepping 0, which is probably https://bugs.chromium.org/p/chromium/issues/detail?id=657233.

It also matches that there's a large unexplained spike, for one particular version.  It'll probably go away (or show up differently) in future builds.

It would be good to filter crash queries to ignore anything from those CPUs.

(the crash in 58.0.3029.110 is different and could get its its own bug if needed)

Comment 3 by hcm@chromium.org, Jun 26 2017

Labels: -ReleaseBlock-Stable
removing RB flag
Labels: Stability-Sheriff-Desktop

Comment 5 by w...@chromium.org, Jul 5 2017

Cc: mmand...@chromium.org brucedaw...@chromium.org
Labels: Arch-x86_64
+brucedawson, FYI

Comment 6 by w...@chromium.org, Jul 5 2017

mmandlis: Could we tag bugs w/ this CPU arch/info with a warning to indicate that they are less actionable?
Owner: fmalita@chromium.org
Status: Assigned (was: Available)
(Stability sheriff triage per Stability-Sheriff-Dekstop tag)

Just to summarize, the belief is this is the AMD CPU bug mentioned here"

https://bugzilla.mozilla.org/show_bug.cgi?id=772330#c55

"Under a highly specific and detailed set of internal timing conditions, the processor may incorrectly
update the branch status when a taken branch occurs where the first or second instruction after the
branch is an indirect call or jump. "

However, we still see 203838 crashes with this signature on current Windows stable 59.0.3071.115, which has been release for 14 days. That's a lot of crashes! (Especially since we only see crashes from opted in users.) In terms of users, it's 79335 unique users according to crash frontend (https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20product.version%3D%2759.0.3071.115%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D).

It seems high enough that we should consider seeing if we could add some kind of workaround to avoid this. Since these are real users having their tabs crash because of this.

Assigning back to y fmalita@ per the above.
Cc: bunge...@chromium.org
Components: -Internals>Skia
Owner: ----
Status: Available (was: Assigned)
Verified that 100% of the 59.0.3071.115 crashes are still on amd64 family 20 model 1/2.

There is no viable workaround that I know of.  This crash shows up in various places, depending on the build.  Looking at crash data, this particular instance only shows up in 59.0.3071.115: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27SkImageInfoValidConversion%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D


So technically it is "fixed" already -- just rev the stable version!


(I also suspect the crash is present in most/all builds, in some form or another, but it is had to identify due to its varying stack trace.  The only things I can think of are a) label crashes from these CPUs as unreliable, as suggested in c#6, and b) don't push such builds to stable.)

Thanks for looking more into this.

Do we know if the generated machine code for SkImageInfoValidConversion() changed between current stable 59.0.3071.115 and previous stable 59.0.3071.109? Also, it seems 59.0.3071.109 was never rolled out beyond 5% stable... so maybe that contributes to us not seeing it on that version?

I'm bringing this up because it's not obvious to me that "So technically it is "fixed" already -- just rev the stable version!" - i.e. I would highly doubt that just changing version would be sufficient. I understand that maybe if enough other things change in the binary and or our compiler version, we could move this crash around - or potentially it could disappear entirely with certain version (i.e. if no machine code tickles the specific AMD bug).
I haven't looked at the generated code, but with PGO it wouldn't be surprising to change from rev to rev (brucedawson is more qualified to shed some light there).

It's also unclear whether the trigger is necessarily in the leaf code or somewhere up the call stack.

Historically, we've seen these crashes pop up semi-randomly in unchanged code, and then subsequently disappear (see issue 657233, for example).  This crash doesn't show up in any of the beta/dev/canary builds, so I'd say there's a good chance it will follow the same pattern and mysteriously disappear in the next M59 build.  It's also possible that an M59 respin will not shuffle things around sufficiently, but sadly I don't have any better suggestion.

Here's a crude query showing crashes which occur > 90% time on buggy CPUs, presumably related:

https://crash.corp.google.com/dremel_query_ui?q=SELECT%20custom_data.ChromeCrashProto.magic_signature_1.name%20AS%20sig%2C%20product.version%20AS%20ver%2C%20COUNT(*)%20AS%20total%2C%20SUM(CASE%20WHEN%20cpu.Info%20%3D%20%27family%2020%20model%201%20stepping%200%27%20THEN%201%20WHEN%20cpu.Info%20%3D%20%27family%2020%20model%202%20stepping%200%27%20THEN%201%20ELSE%200%20END)%20AS%20badcpus%20FROM%20crash.prod.latest%20WHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20GROUP%20BY%20sig%2Cver%20HAVING%20badcpus%20%3E%2020%20AND%20(badcpus%20%2F%20total)%20%3E%200.9%20ORDER%20BY%20badcpus%20DESC%3B

Looks like Skia really hit the jackpot in 59.0.3071.115, but maybe the large number of crashes for that rev has to do with it being a stable release more than anything else.

There are other crashes in there, but the silver lining is it doesn't seem to affect quite as many builds as I would have guessed.
There really isn't enough information in the erratum to guess what sort of code rearrangements could affect the bug. However if it truly is timing related then the crashing code itself might not even need to change - it could be enough to change its location relative to cache-line boundaries or its location relative to other pieces of code (since cache lines and branch predictors have a finite number of sets and therefore alias with each other which therefore randomizes timing).

So maybe the crash does go away or become far less likely in some builds. That's possible. Or it could move to a different crash signature - I dealt with a lot of changing signatures when looking at VS 2017 builds. That's possible.

Either way I'm not sure what we *do* about it. Warn users if they have a vulnerable CPU that they haven't applied the patch to?

Labels: -Stability-Sheriff-Desktop

Comment 13 by wfh@chromium.org, Aug 29 2017

Labels: -Restrict-View-Google
No private stacks in here. No need for RVG.
Components: Internals>Skia

Comment 15 by hcm@chromium.org, Nov 28 2017

Status: WontFix (was: Available)
Closing out as the issue did seem to spike then resolve, and was related to a specific CPU vs something that could be (reasonably) fixed in our code.

Sign in to add a comment