Issue metadata
Sign in to add a comment
|
Chrome: Crash Report - gfx::Image::AddRepresentation |
||||||||||||||||||||||||
Issue descriptionreported-by: kavvaru@google.com Magic Signature: gfx::Image::AddRepresentation Crash link: https://crash.corp.google.com/browse?q=product.name%3D'Chrome'%20AND%20product.version%3D'60.0.3112.66'%20AND%20custom_data.ChromeCrashProto.channel%3D'beta'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20ReportID%3D'dc11c9be40000000'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'gfx%3A%3AImage%3A%3AAddRepresentation'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#3 ------------------------------------------------------------------------------- Sample Report ------------------------------------------------------------------------------- Product name: Chrome Magic Signature : gfx::Image::AddRepresentation Product Version: 60.0.3112.66 Report ID: dc11c9be40000000 Report Url: https://crash.corp.google.com/dc11c9be40000000 Report Time: 2017-07-12T22:42:45-07:00 Upload Time: 2017-07-12T22:43:01.64-07:00 Uptime: 277000 ms CumulativeProductUptime: 0 ms OS Name: Windows NT OS Version: 6.1.7600 16385 CPU Architecture: x86 CPU Info: GenuineIntel family 6 model 23 stepping 10 ------------------------------------------------------------------------------- Crashing thread: thread index: 10. Stack Quality: 100%. Thread id : 5972. ------------------------------------------------------------------------------- 0x5b29a6af (chrome.dll - image.cc: 769) gfx::Image::AddRepresentation(std::unique_ptr<gfx::internal::ImageRep,std::default_delete<gfx::internal::ImageRep> >) 0x5b29a297 (chrome.dll - image.cc: 523) gfx::Image::ToImageSkia() 0x5b29a210 (chrome.dll - image.cc: 486) gfx::Image::ToSkBitmap() 0x5b29a497 (chrome.dll - image.cc: 657) gfx::Image::AsBitmap() 0x5b2a4749 (chrome.dll - image_family.cc: 129) gfx::ImageFamily::CreateExact(int,int) 0x5b29aae3 (chrome.dll - icon_util.cc: 80) `anonymous namespace'::BuildResizedImageFamily 0x5b29b36c (chrome.dll - icon_util.cc: 467) IconUtil::CreateIconFileFromImageFamily(gfx::ImageFamily const &,base::FilePath const &,IconUtil::WriteType) 0x5ba0b53f (chrome.dll - web_app_win.cc: 67) `anonymous namespace'::SaveIconWithCheckSum 0x5ba0c307 (chrome.dll - web_app_win.cc: 432) web_app::internals::CheckAndSaveIcon(base::FilePath const &,gfx::ImageFamily const &,bool) 0x5ba32583 (chrome.dll - update_shortcut_worker_win.cc: 208) web_app::UpdateShortcutWorker::UpdateShortcutsOnFileThread() 0x5affbd2d (chrome.dll - task_annotator.cc: 59) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x5afb8e10 (chrome.dll - message_loop.cc: 409) base::MessageLoop::RunTask(base::PendingTask *) 0x5afb97a9 (chrome.dll - message_loop.cc: 508) base::MessageLoop::DoWork() 0x5affc88b (chrome.dll - message_pump_win.cc: 475) base::MessagePumpForIO::DoRunLoop() 0x5affbee4 (chrome.dll - message_pump_win.cc: 56) base::MessagePumpWin::Run(base::MessagePump::Delegate *) 0x5afb8a1e (chrome.dll - message_loop.cc: 360) base::MessageLoop::Run() 0x5afd9a7f (chrome.dll - run_loop.cc: 111) base::RunLoop::Run() 0x5afb768a (chrome.dll - thread.cc: 255) base::Thread::Run(base::RunLoop *) 0x5ab32a73 (chrome.dll - browser_thread_impl.cc: 252) content::BrowserThreadImpl::FileThreadRun(base::RunLoop *) 0x5ab32c30 (chrome.dll - browser_thread_impl.cc: 305) content::BrowserThreadImpl::Run(base::RunLoop *) 0x5afb7802 (chrome.dll - thread.cc: 338) base::Thread::ThreadMain() 0x5af92ccc (chrome.dll - platform_thread_win.cc: 89) base::`anonymous namespace'::ThreadFunc 0x76391173 (kernel32.dll + 0x00051173) BaseThreadInitThunk 0x775cb3f4 (ntdll.dll + 0x0005b3f4) __RtlUserThreadStart 0x775cb3c7 (ntdll.dll + 0x0005b3c7) _RtlUserThreadStart
,
Jul 13 2017
,
Jul 14 2017
OK I'll have a look since I've done a bunch of work on Image and ImageFamily. Crash happens on FILE thread. Crash is on this line (confirmed on the exact revisions in the crash logs): 769 CHECK(result.second) << "type was already in map."; in Image::AddRepresentation. This means that rep->type() is already in the map. Image::ToImageSkia: 1. Checks that there is no existing rep with kImageRepSkia in the map (if there is, it returns it). 2. Creates a rep with kImageRepSkia. 3. Calls Image::AddRepresentation which crashes if there is already a kImageRepSkia in the map. This means there is a race condition happening where two kImageRepSkias are being added to the map at the same time (which can happen if Image::ToImageSkia is called on two different threads). This can't possibly be related to r381607, a) because it's 16 months old, and b) because it doesn't change any control flow, just a minor refactor around how pointers are passed. +tapted and +tzik who have been refactoring around the threading of images. Could ToImageSkia be getting called on the UI thread at the same time?
,
Jul 14 2017
I just realised that the reason this CHECK was added in the first place (r381141) was because we *were* seeing mysterious dangerous crashes (use-after-free) here, so I added a CHECK "so it will now crash cleanly." What we are seeing is the clean crashes. This was discussed extensively in Issue 590882 . Comment #11 is my last summary where I concluded that we needed to make UpdateShortcutWorker take a full copy of the image. Trent, you've been messing with this stuff lately. Does it fit into your refactorings or should I look at it again?
,
Jul 14 2017
,
Jul 14 2017
I.. actually haven't touched update_shortcut_worker_win.cc yet :) I'm not sure it's the right fix, but one possibility is to modify ImageFamily::Add(..) so that it always calls image.AsBitmap() on its argument to ensure an ImageSkia exists. I suspect gfx::Image is trying to be helpful by not decoding the png data until the last possible moment, and then we try to do that on both the UI thread and the FILE thread.
,
Jul 14 2017
#6 I think that would probably solve this but only by changing the architecture so we pre-compute Skias. Arguably, you could say Image should do the same thing (but there would be performance problems maybe). So stepping back from that change, we could just add a method to Image and ImageFamily "EnsureImageSkiasExist" which simply calls ToImageSkia() on all images. Then call it from the web_app code on the UI thread before handoff to FILE thread. Still a hack IMHO but it should work around this.
,
Jul 14 2017
(Note that the code in ImageFamily::CreateExact has been carefully written to not trigger thread safety issues by not incrementing the refcount. But it unfortunately makes the assumption that all the images have Skias.)
,
Jul 14 2017
I tried doing this but there's another problem: the skias themselves need to be made thread-safe. This is probably a sort of a separate problem (we can probably avoid this crash without fixing that issue). But it's still troubling. Sigh. I have been dealing with thread-safety of gfx::Image (and specifically the favicons in ShortcutInfo) for four and a half years.
,
Jul 14 2017
My not very educated opinion: It seems wrong for a user of gfx::Image to have to know about the implementation details of gfx::Image. This seems to point to conversions between gfx::Image types being thread safe.
,
Jul 17 2017
#10: I agree and I think gfx::Image should be more helpful here. I've tried a couple of approaches in the past but they haven't panned out. (First I tried making gfx::Image fully thread safe but rsesek shot it down because of performance; fair enough. Then I tried a more ambitious refactor to make gfx::Image non-ref-counting but I think I bit off too much. Maybe will try something less ambitious.)
,
Jul 17 2017
Image::RepresentationMap is a std::map which holds max 4 pointers. It would be pretty straightforward to make that a lock-free data structure wrapping an array of 4 pointers. Perhaps give it a SequenceChecker to ensure it's created and destroyed on the same "thread". AddRepresentation then just does a compare-and-swap and deletes its argument if there's a data race. I daresay removing the std::map in this case would be a boon to performance..
,
Jul 17 2017
-RVE; not security since it hits a CHECK.
,
Jul 18 2017
#12 OK I did this. It works but it's a lot of code (very complex reasoning and the comments in base/atomicops.h say to only use this if you have a good reason) and I don't think it solves the threading issue. gfx::Image has two thread-safety issues: 1. scoped_refptr<internal::ImageStorage> storage_ is not thread-safe. Trivially fixable by changing ImageStorage from RefCounted to RefCountedThreadSafe, but with non-trivial performance implications. 2. ImageStorage's RepresentationMap is not thread-safe. That's what I've worked around by rewriting it as a struct with CompareAndSwap. So we can solve #2 but we'd still have to make it RefCountedThreadSafe to actually fix this. I don't think it's worth solving #2 without #1 also. Note I tried to do this in https://codereview.chromium.org/240293006/ and while it wasn't exactly shot down by rsesek, he raised concerns about performance and I don't have a good way of measuring this. I found Issue 596348 which is also about this. I think we can solve this more directly in UpdateShortcutWorker without fixing the thread safety of gfx::Image. I'd still like to find some solution to making this class usable without having to think so hard about threading though.
,
Jul 18 2017
,
Jul 18 2017
When does #1 come up? Is it an option to ensure each gfx::Image is only ever created and destroyed on the same sequence? (i.e. sequence checker in gfx::Image constructor and destructor.. it should probably have that now anyway) I guess it would be somewhat similar to what we do for web_app::ShortcutInfo, whereby you can can only ever pass a raw pointer to a worker thread, along with a Reply that owns the "real" gfx::Image (and ImageStorage reference).
,
Jul 19 2017
#16: > When does #1 come up? Whenever an Image is copied or passed by-value (i.e., all the time). The scoped_refptr is being inc'd/dec'd all the time which is not thread-safe. The reason we aren't seeing that live is that the non-atomic critical section is very small, versus the extremely wide CS of ToImageSkia (which is the entire computation of converting a PNG to a Skia). So the probability of hitting problem #2 is far higher than #1, but it's still a problem. In fact, we could make a trivial fix to ToImageSkia that performs a non-atomic check after creating the ImageSkia, which would make this issue virtually go away. It would still be wrong, but perhaps worth doing as a quick-fix before M61.
,
Jul 19 2017
The ImageSkiaStorage in image_skia.cc is base::RefCountedThreadSafe
,
Jul 20 2017
#18 Yes, in my above analysis I am treating the ImageSkia as a thread-safe object. Those other "safety holes" are still an issue.
,
Jul 20 2017
So: 1. It turns out that RefCounted already DCHECKS on non-thread-safe uses of refcounts, so the above "issue number 1" is already checked and apparently we aren't hitting that. Note that none of the code I have looked at is copying gfx::Images in the non-UI thread, so we are on thin ice but technically not violating thread safety of the refcount itself. That leaves only the internal map. 2. I've come up with a new proposal to add a sequence checker to gfx::Image: Issue 600229 . That should properly highlight where we are seeing non-threadsafe uses of the internal map. 3. We can fix these by creating shallow copies of the gfx::Image as necessary (my preferred approach because it avoids thread safety issues entirely), or alternatively as tapted suggested in #12 by making the map into an atomic-update struct. That would mean we actually do not need an additional sequence checker, but it would mean we're only OK if we never copy the gfx::Image on a thread other than UI.
,
Jul 21 2017
,
Jul 24 2017
> Note I tried to do this in https://codereview.chromium.org/240293006/ and while it wasn't exactly shot down by rsesek, he raised concerns about performance and I don't have a good way of measuring this. I recently discovered that we have //base:base_perftests. It could be useful to have a shootout-style test between base::RefCounted and base::RefCountedThreadSafe. If the delta between them is marginal, then that'd speak to at least making the storage RefCountedThreadSafe. That doesn't solve the separate issue of manipulating the storage from multiple threads, by acquiring a lock around it, but it would help with the lifetime issues.
,
Jul 25 2017
#22 I have a CL to make the storage manipulation (the map itself) atomic without using locking (https://chromium-review.googlesource.com/583689). It adds some complexity but we can solve both of these thread safety issues without locking. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by kavvaru@chromium.org
, Jul 13 2017Labels: -Type-Bug M-60 Type-Bug-Regression
Owner: mgiuca@chromium.org
Status: Assigned (was: Untriaged)