New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742145 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 596348
Owner:
Long OOO (go/where-is-mgiuca)
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Regression

Blocked on:
issue 590882
issue 596348
issue 745355



Sign in to add a comment

Chrome: Crash Report - gfx::Image::AddRepresentation

Project Member Reported by cr...@system.gserviceaccount.com, Jul 13 2017

Issue description

reported-by: kavvaru@google.com

Magic Signature: gfx::Image::AddRepresentation

Crash link: https://crash.corp.google.com/browse?q=product.name%3D'Chrome'%20AND%20product.version%3D'60.0.3112.66'%20AND%20custom_data.ChromeCrashProto.channel%3D'beta'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20ReportID%3D'dc11c9be40000000'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'gfx%3A%3AImage%3A%3AAddRepresentation'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3

-------------------------------------------------------------------------------
Sample Report
-------------------------------------------------------------------------------
Product name: Chrome
Magic Signature : gfx::Image::AddRepresentation
Product Version: 60.0.3112.66
Report ID: dc11c9be40000000
Report Url: https://crash.corp.google.com/dc11c9be40000000
Report Time: 2017-07-12T22:42:45-07:00
Upload Time: 2017-07-12T22:43:01.64-07:00
Uptime: 277000 ms
CumulativeProductUptime: 0 ms
OS Name: Windows NT
OS Version: 6.1.7600 16385
CPU Architecture: x86
CPU Info: GenuineIntel family 6 model 23 stepping 10

-------------------------------------------------------------------------------
Crashing thread: thread index: 10. Stack Quality: 100%. Thread id : 5972.
-------------------------------------------------------------------------------
0x5b29a6af (chrome.dll - image.cc: 769)	gfx::Image::AddRepresentation(std::unique_ptr<gfx::internal::ImageRep,std::default_delete<gfx::internal::ImageRep> >)
0x5b29a297 (chrome.dll - image.cc: 523)	gfx::Image::ToImageSkia()
0x5b29a210 (chrome.dll - image.cc: 486)	gfx::Image::ToSkBitmap()
0x5b29a497 (chrome.dll - image.cc: 657)	gfx::Image::AsBitmap()
0x5b2a4749 (chrome.dll - image_family.cc: 129)	gfx::ImageFamily::CreateExact(int,int)
0x5b29aae3 (chrome.dll - icon_util.cc: 80)	`anonymous namespace'::BuildResizedImageFamily
0x5b29b36c (chrome.dll - icon_util.cc: 467)	IconUtil::CreateIconFileFromImageFamily(gfx::ImageFamily const &,base::FilePath const &,IconUtil::WriteType)
0x5ba0b53f (chrome.dll - web_app_win.cc: 67)	`anonymous namespace'::SaveIconWithCheckSum
0x5ba0c307 (chrome.dll - web_app_win.cc: 432)	web_app::internals::CheckAndSaveIcon(base::FilePath const &,gfx::ImageFamily const &,bool)
0x5ba32583 (chrome.dll - update_shortcut_worker_win.cc: 208)	web_app::UpdateShortcutWorker::UpdateShortcutsOnFileThread()
0x5affbd2d (chrome.dll - task_annotator.cc: 59)	base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *)
0x5afb8e10 (chrome.dll - message_loop.cc: 409)	base::MessageLoop::RunTask(base::PendingTask *)
0x5afb97a9 (chrome.dll - message_loop.cc: 508)	base::MessageLoop::DoWork()
0x5affc88b (chrome.dll - message_pump_win.cc: 475)	base::MessagePumpForIO::DoRunLoop()
0x5affbee4 (chrome.dll - message_pump_win.cc: 56)	base::MessagePumpWin::Run(base::MessagePump::Delegate *)
0x5afb8a1e (chrome.dll - message_loop.cc: 360)	base::MessageLoop::Run()
0x5afd9a7f (chrome.dll - run_loop.cc: 111)	base::RunLoop::Run()
0x5afb768a (chrome.dll - thread.cc: 255)	base::Thread::Run(base::RunLoop *)
0x5ab32a73 (chrome.dll - browser_thread_impl.cc: 252)	content::BrowserThreadImpl::FileThreadRun(base::RunLoop *)
0x5ab32c30 (chrome.dll - browser_thread_impl.cc: 305)	content::BrowserThreadImpl::Run(base::RunLoop *)
0x5afb7802 (chrome.dll - thread.cc: 338)	base::Thread::ThreadMain()
0x5af92ccc (chrome.dll - platform_thread_win.cc: 89)	base::`anonymous namespace'::ThreadFunc
0x76391173 (kernel32.dll + 0x00051173)	BaseThreadInitThunk
0x775cb3f4 (ntdll.dll + 0x0005b3f4)	__RtlUserThreadStart
0x775cb3c7 (ntdll.dll + 0x0005b3c7)	_RtlUserThreadStart

 
Components: Blink>Image
Labels: -Type-Bug M-60 Type-Bug-Regression
Owner: mgiuca@chromium.org
Status: Assigned (was: Untriaged)
This crash is first stared from 56.0.2924.87 and on latest beta seeing 2 from 2 different clients so far.

61.0.3128.2	0.10%	2	
61.0.3124.11	0.05%	1	
61.0.3124.10	0.05%	1	
60.0.3112.66	0.10%	2	
60.0.3112.50	0.65%	13	
59.0.3071.115	18.23%	366	

Link to the list of builds
https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27gfx%3A%3AImage%3A%3AAddRepresentation%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion:1000

Through code search on file "image.cc" suspecting the below change
https://chromium.googlesource.com/chromium/src/+/ce72b2661074ea7d583a276248d50e9763df5ede%5E%21/

mgiuca@ Could you please look into this issue if it is related to your change,else please help us in finding the appropriate owner for this issue.

Note::
1.This is top #23 browser crash windows beta
2.This crash is observed only on Windows OS.

Thanks,
Components: -Blink>Image UI>GFX

Comment 3 by mgiuca@chromium.org, Jul 14 2017

Cc: tapted@chromium.org tzik@chromium.org
OK I'll have a look since I've done a bunch of work on Image and ImageFamily.

Crash happens on FILE thread.

Crash is on this line (confirmed on the exact revisions in the crash logs):

769   CHECK(result.second) << "type was already in map.";

in Image::AddRepresentation. This means that rep->type() is already in the map.

Image::ToImageSkia:
1. Checks that there is no existing rep with kImageRepSkia in the map (if there is, it returns it).
2. Creates a rep with kImageRepSkia.
3. Calls Image::AddRepresentation which crashes if there is already a kImageRepSkia in the map.

This means there is a race condition happening where two kImageRepSkias are being added to the map at the same time (which can happen if Image::ToImageSkia is called on two different threads).

This can't possibly be related to r381607, a) because it's 16 months old, and b) because it doesn't change any control flow, just a minor refactor around how pointers are passed.

+tapted and +tzik who have been refactoring around the threading of images. Could ToImageSkia be getting called on the UI thread at the same time?

Comment 4 by mgiuca@chromium.org, Jul 14 2017

I just realised that the reason this CHECK was added in the first place (r381141) was because we *were* seeing mysterious dangerous crashes (use-after-free) here, so I added a CHECK "so it will now crash cleanly."

What we are seeing is the clean crashes.

This was discussed extensively in  Issue 590882 . Comment #11 is my last summary where I concluded that we needed to make UpdateShortcutWorker take a full copy of the image.

Trent, you've been messing with this stuff lately. Does it fit into your refactorings or should I look at it again?

Comment 5 by mgiuca@chromium.org, Jul 14 2017

Blockedon: 590882

Comment 6 by tapted@chromium.org, Jul 14 2017

I.. actually haven't touched update_shortcut_worker_win.cc yet :)

I'm not sure it's the right fix, but one possibility is to modify ImageFamily::Add(..) so that it always calls image.AsBitmap() on its argument to ensure an ImageSkia exists. 

I suspect gfx::Image is trying to be helpful by not decoding the png data until the last possible moment, and then we try to do that on both the UI thread and the FILE thread.

Comment 7 by mgiuca@chromium.org, Jul 14 2017

#6 I think that would probably solve this but only by changing the architecture so we pre-compute Skias. Arguably, you could say Image should do the same thing (but there would be performance problems maybe).

So stepping back from that change, we could just add a method to Image and ImageFamily "EnsureImageSkiasExist" which simply calls ToImageSkia() on all images. Then call it from the web_app code on the UI thread before handoff to FILE thread. Still a hack IMHO but it should work around this.

Comment 8 by mgiuca@chromium.org, Jul 14 2017

(Note that the code in ImageFamily::CreateExact has been carefully written to not trigger thread safety issues by not incrementing the refcount. But it unfortunately makes the assumption that all the images have Skias.)

Comment 9 by mgiuca@chromium.org, Jul 14 2017

I tried doing this but there's another problem: the skias themselves need to be made thread-safe.

This is probably a sort of a separate problem (we can probably avoid this crash without fixing that issue). But it's still troubling.

Sigh. I have been dealing with thread-safety of gfx::Image (and specifically the favicons in ShortcutInfo) for four and a half years.
My not very educated opinion: It seems wrong for a user of gfx::Image to have to know about the implementation details of gfx::Image. This seems to point to conversions between gfx::Image types being thread safe.
#10: I agree and I think gfx::Image should be more helpful here. I've tried a couple of approaches in the past but they haven't panned out. (First I tried making gfx::Image fully thread safe but rsesek shot it down because of performance; fair enough. Then I tried a more ambitious refactor to make gfx::Image non-ref-counting but I think I bit off too much. Maybe will try something less ambitious.)
Image::RepresentationMap is a std::map which holds max 4 pointers.

It would be pretty straightforward to make that a lock-free data structure wrapping an array of 4 pointers. Perhaps give it a SequenceChecker to ensure it's created and destroyed on the same "thread".

AddRepresentation then just does a compare-and-swap and deletes its argument if there's a data race.

I daresay removing the std::map in this case would be a boon to performance..
Labels: -Restrict-View-EditIssue
-RVE; not security since it hits a CHECK.
Blockedon: 596348
Cc: rsesek@chromium.org
#12 OK I did this. It works but it's a lot of code (very complex reasoning and the comments in base/atomicops.h say to only use this if you have a good reason) and I don't think it solves the threading issue.

gfx::Image has two thread-safety issues:
1. scoped_refptr<internal::ImageStorage> storage_ is not thread-safe. Trivially fixable by changing ImageStorage from RefCounted to RefCountedThreadSafe, but with non-trivial performance implications.
2. ImageStorage's RepresentationMap is not thread-safe. That's what I've worked around by rewriting it as a struct with CompareAndSwap.

So we can solve #2 but we'd still have to make it RefCountedThreadSafe to actually fix this. I don't think it's worth solving #2 without #1 also.

Note I tried to do this in https://codereview.chromium.org/240293006/ and while it wasn't exactly shot down by rsesek, he raised concerns about performance and I don't have a good way of measuring this.

I found  Issue 596348  which is also about this. I think we can solve this more directly in UpdateShortcutWorker without fixing the thread safety of gfx::Image. I'd still like to find some solution to making this class usable without having to think so hard about threading though.
Blockedon: 745355
When does #1 come up?

Is it an option to ensure each gfx::Image is only ever created and destroyed on the same sequence? (i.e. sequence checker in gfx::Image constructor and destructor.. it should probably have that now anyway)

I guess it would be somewhat similar to what we do for web_app::ShortcutInfo, whereby you can can only ever pass a raw pointer to a worker thread, along with a Reply that owns the "real" gfx::Image (and ImageStorage reference).
#16:

> When does #1 come up?

Whenever an Image is copied or passed by-value (i.e., all the time). The scoped_refptr is being inc'd/dec'd all the time which is not thread-safe. The reason we aren't seeing that live is that the non-atomic critical section is very small, versus the extremely wide CS of ToImageSkia (which is the entire computation of converting a PNG to a Skia). So the probability of hitting problem #2 is far higher than #1, but it's still a problem.

In fact, we could make a trivial fix to ToImageSkia that performs a non-atomic check after creating the ImageSkia, which would make this issue virtually go away. It would still be wrong, but perhaps worth doing as a quick-fix before M61.
The ImageSkiaStorage in image_skia.cc is base::RefCountedThreadSafe
#18 Yes, in my above analysis I am treating the ImageSkia as a thread-safe object. Those other "safety holes" are still an issue.
So:
1. It turns out that RefCounted already DCHECKS on non-thread-safe uses of refcounts, so the above "issue number 1" is already checked and apparently we aren't hitting that. Note that none of the code I have looked at is copying gfx::Images in the non-UI thread, so we are on thin ice but technically not violating thread safety of the refcount itself. That leaves only the internal map.
2. I've come up with a new proposal to add a sequence checker to gfx::Image:  Issue 600229 . That should properly highlight where we are seeing non-threadsafe uses of the internal map.
3. We can fix these by creating shallow copies of the gfx::Image as necessary (my preferred approach because it avoids thread safety issues entirely), or alternatively as tapted suggested in #12 by making the map into an atomic-update struct. That would mean we actually do not need an additional sequence checker, but it would mean we're only OK if we never copy the gfx::Image on a thread other than UI.
Mergedinto: 596348
Status: Duplicate (was: Assigned)
> Note I tried to do this in https://codereview.chromium.org/240293006/ and while it wasn't exactly shot down by rsesek, he raised concerns about performance and I don't have a good way of measuring this.

I recently discovered that we have //base:base_perftests. It could be useful to have a shootout-style test between base::RefCounted and base::RefCountedThreadSafe. If the delta between them is marginal, then that'd speak to at least making the storage RefCountedThreadSafe. That doesn't solve the separate issue of manipulating the storage from multiple threads, by acquiring a lock around it, but it would help with the lifetime issues.
#22 I have a CL to make the storage manipulation (the map itself) atomic without using locking (https://chromium-review.googlesource.com/583689). It adds some complexity but we can solve both of these thread safety issues without locking.

Sign in to add a comment