Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Issue 666707 Chrome: Crash Report - ThreadWatcherList::ParseCommandLineCrashOnHangThreads
Starred by 5 users Project Member Reported by a...@chromium.org, Nov 18 Back to list
Status: Fixed
Owner:
Closed: Dec 2
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment
Product name: Chrome
Magic Signature: ThreadWatcherList::ParseCommandLineCrashOnHangThreads

Current link:
https://crash.corp.google.com/browse?q=product.name%3D'Chrome'%20AND%20product.version%3D'56.0.2924.0'%20AND%20custom_data.ChromeCrashProto.ptype%3D'browser'%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D'ThreadWatcherList%3A%3AParseCommandLineCrashOnHangThreads'&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#reports


Search properties:
product.name: Chrome
product.version: 56.0.2924.0
custom_data.chromecrashproto.ptype: browser

Stack trace:
============
Thread 0 CRASHED [EXCEPTION_ILLEGAL_INSTRUCTION @ 0x0fb2f720 ] MAGIC SIGNATURE THREAD
Stack Quality100%Show frame trust levels
0x0fb2f720	(chrome.dll -math.h:642 )	ceilf
0x113e4a48	(chrome.dll -thread_watcher.cc:523 )	ThreadWatcherList::ParseCommandLineCrashOnHangThreads(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,unsigned int,unsigned int,std::map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,ThreadWatcherList::CrashDataThresholds,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,ThreadWatcherList::CrashDataThresholds> > > *)
0x113e47f8	(chrome.dll -thread_watcher.cc:494 )	ThreadWatcherList::ParseCommandLine(base::CommandLine const &,unsigned int *,std::map<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,ThreadWatcherList::CrashDataThresholds,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> > const ,ThreadWatcherList::CrashDataThresholds> > > *)
0x113e43d3	(chrome.dll -thread_watcher.cc:367 )	ThreadWatcherList::StartWatchingAll(base::CommandLine const &)
0x1065c202	(chrome.dll -chrome_browser_main.cc:1804 )	ChromeBrowserMainParts::PreMainMessageLoopRunImpl()
0x1065afe7	(chrome.dll -chrome_browser_main.cc:1249 )	ChromeBrowserMainParts::PreMainMessageLoopRun()
0x102e83c6	(chrome.dll -browser_main_loop.cc:967 )	content::BrowserMainLoop::PreMainMessageLoopRun()
0x0fc1749e	(chrome.dll -callback.h:64 )	base::internal::RunMixin<base::Callback<bool ,1,1> >::Run()
0x102e7f66	(chrome.dll -browser_main_loop.cc:852 )	content::BrowserMainLoop::CreateStartupTasks()
0x102eae14	(chrome.dll -browser_main_runner.cc:126 )	content::BrowserMainRunnerImpl::Initialize(content::MainFunctionParams const &)
0x102e6668	(chrome.dll -browser_main.cc:42 )	content::BrowserMain(content::MainFunctionParams const &)
0x10623c34	(chrome.dll -content_main_runner.cc:774 )	content::ContentMainRunnerImpl::Run()
0x0fb25d02	(chrome.dll -chrome_main.cc:108 )	ChromeMain
0x00d758e8	(chrome.exe -main_dll_loader_win.cc:174 )	MainDllLoader::Launch(HINSTANCE__ *,base::TimeTicks)
0x00d71c38	(chrome.exe -chrome_exe_main_win.cc:249 )	wWinMain
0x00ddbfc7	(chrome.exe -exe_common.inl:253 )	__scrt_common_main_seh
0x779aee6b	(kernel32.dll + 0x0004ee6b )	BaseThreadInitThunk
0x77ba3ab2	(ntdll.dll + 0x00063ab2 )	__RtlUserThreadStart
0x77ba3a85	(ntdll.dll + 0x00063a85 )	_RtlUserThreadStart

Link to the list of the builds:
==============================
https://crash.corp.google.com/browse?q=custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27ThreadWatcherList%3A%3AParseCommandLineCrashOnHangThreads%27%20AND%20product.name%3D%27Chrome%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D

 
Cc: manoranj...@chromium.org a...@chromium.org
Components: Internals>TaskScheduler
Labels: ReleaseBlock-Dev Stability-Sheriff-Desktop
Owner: robliao@chromium.org
Status: Assigned
Windows canary(56.0.2924.0) has been live for 1 hour and has shown 100 crashes from 73 clients till now.This is regression from the last canary(56.0.2920.0), hence considering below as the 

Changelog:
==========
https://chromium.googlesource.com/chromium/src/+log/56.0.2922.0..56.0.2924.0?pretty=fuller&n=10000

robliao@: Could you please take a look at these crashes and confirm if https://codereview.chromium.org/2506693002 could be related.

Cc'ing  fdoray@ as well for https://codereview.chromium.org/2445763002.


https://codereview.chromium.org/2445763002 r432913 is not responsible for this crash since it was committed and reverted in the 56.0.2922.0..56.0.2924.0 range https://codereview.chromium.org/2517443002 r432989

The crash report has an illegal instruction exception. Did we change something that could have allowed this instruction in our binary?

Disassembly:
--- c:\b\depot_tools\win_toolchain\vs_files\d5dc33b15d1b2c086f2f6632e2fd15882f80dbd3\win_sdk\include\10.0.10586.0\ucrt\math.h 
5D9CF71A  push        ebp  
5D9CF71B  mov         ebp,esp  
5D9CF71D  sub         esp,0Ch  
5D9CF720  vmovss      xmm0,dword ptr [_X]  <------ Illegal instruction error
5D9CF725  vcvtss2sd   xmm0,xmm0,xmm0  
5D9CF729  vmovsd      qword ptr [ebp-0Ch],xmm0  
math.h:642 looks pretty innocent
    _Check_return_ __inline float __CRTDECL ceilf(_In_ float _X)
    {
        return (float)ceil(_X);
    }

Disk corruption?
Chrome 56.0.2924.1 Looks like this:
0:000> u 1010769d
chrome!ceilf:
1010769d 55              push    ebp
1010769e 8bec            mov     ebp,esp
101076a0 83e4c0          and     esp,0FFFFFFC0h
101076a3 83ec40          sub     esp,40h
101076a6 f30f104508      movss   xmm0,dword ptr [ebp+8]
101076ab 83ec08          sub     esp,8
101076ae 0f5ac0          cvtps2pd xmm0,xmm0
101076b1 f20f11442440    movsd   mmword ptr [esp+40h],xmm0

The crash looks like this:
0:000> u 6159f71a          
chrome_614a0000!ceilf
6159f71a 55              push    ebp
6159f71b 8bec            mov     ebp,esp
6159f71d 83ec0c          sub     esp,0Ch
6159f720 c5fa104508      vmovss  xmm0,dword ptr [ebp+8]
6159f725 c5fa5ac0        vcvtss2sd xmm0,xmm0,xmm0
6159f729 c5fb1145f4      vmovsd  qword ptr [ebp-0Ch],xmm0
6159f72e dd45f4          fld     qword ptr [ebp-0Ch]
6159f731 51              push    ecx

Strange change of instructions for the same code.
Cc: fdoray@chromium.org brucedaw...@chromium.org
Components: -Internals>TaskScheduler
Dropping Internals > TaskScheduler since it's not really a task scheduler bug.

Adding brucedawson@ who may have seen something like this before.
I just talked to robliao@/fdoray@ about this. The problem is almost certainly an ODR-type violation caused by a failure to inline an inline function that is called by translation units compiled normally and compiled with /AVX (or some variant). Two or more versions of _ceilf are generated (some using AVX instructions, some not) and the linker is free to choose whichever ones it wants. This means that non-AVX code paths end up using AVX, and crashes happen.

As far as I can tell it is almost impossible to safely build part of a binary with /AVX.

This issue was hit elsewhere in Chrome some weeks ago and the VC++ team helped explain the issue.
This looks like a duplicate (albeit perhaps a more serious variant) of  crbug.com/654213 .

For context, these links were helpful in understanding the issue, including a Connect bug which Microsoft commented on.

1. Reproduction example: https://bitbucket.org/chromiumembedded/cef/issues/attachments/1999/chromiumembedded/cef/1474399239.3/1999/vs2015u3-ltcg-bug-1.zip
2. Quick analysis: https://bitbucket.org/chromiumembedded/cef/issues/1999#comment-30763602
3. Microsoft Connect entry: https://connect.microsoft.com/VisualStudio/feedback/details/3103806

This all assumes that vmovss is an AVX or AVX2 instruction which I should probably double check...
Yup. vmovss is an AVX instruction.
VMOVSS xmm1, m32 (Opcode: VEX.LIG.F3.0F.WIG 10 /r)
CPUID Feature Flag: AVX
vmovss is an AVX instruction.  (Anything with a v-prefix is.)

We've wrapped all SkNx types in anonymous namespaces, and force-inlined all methods.  They should be safe now both by law (the namespace) and doubly so in practice (force-inlining).

Is the problem here that math.h does not mark ceilf, etc as static?
Cc: mtklein@chromium.org reed@google.com fmalita@chromium.org
Issue 666722 has been merged into this issue.
The problem is that if ceilf is ever called by a translation unit that is compiled with /AVX then we are potentially doomed.

If the compiler decides not to inline ceilf then non-inline code will be generated and at link time the linker gets to choose between all non-inlined versions of ceilf.

So, it is not enough to use anonymous namespaces or forceinline in skia code. This has to be applied to *all* code that is included in /AVX translation units.

Uggh.

Unsurprisngly...
0:000> !cpuid
CP  F/M/S  Manufacturer     MHza
 0  6,23,10 GenuineIntel    2000
 1  6,23,10 GenuineIntel    2000

From what I can gather from the web, this is a
Pentium(R) Dual-Core CPU T4300 @ 2.10GHz
Penryn Core. This predates AVX.
This problem goes away if the functions are static.  They're no longer the same ceilf, and the linker has no choices to make.

Every function that's defined in a header needs to be marked static, or you risk ODR violation.  Here it's just easy because the AVX instructions are so different, but it's easy to construct if the code differs depending on a preprocessor define or some other compile-time bit.

Do you happen to have the definition of ceilf from this math.h handy?  Is it marked static?
Oh, right, it's up there in comment 3.  No static.  That's the bug.
Here is the VC++ definition of ceilf:

_Check_return_ __inline float __CRTDECL ceilf(_In_ float _X)
{
    return (float)ceil(_X);
}

Defining all header-file functions as static would indeed avoid this, but is that really standard practice? Absent a requirement to do this I think Microsoft would (and in fact has) argued that the mistake is ours.

Practically speaking we will need to come up with a fix other than marking ceilf as static, but if there is a C++ standard requirement or recommendation that inline functions be static then that could affect future VC++ versions.
Yes, as far as I know, all functions that GCC and Clang define in headers are static.  I presume it's for this reason, but we might want to get someone who can speak with more authority there.

Here's what I'm thinking, first in Skia, and then if that goes OK, similarly in Chromium: https://skia-review.googlesource.com/c/5005/
robliao@, Would it be possible to revert the change ASAP and merge it to 2924? we have some other hiccups with qualifying a good M56 dev candidate and we are planning to trigger one from 2922 after the revert.

Appreciate your help!
Cc: anan...@chromium.org bustamante@chromium.org
Typo: we are planning to trigger one from 2924 after the revert.
I don't think reverting any CL mentioned on this thread will help anything.
I agree with mtklein@ that reverting is unlikely to help. The change listed in comment #16 is step one towards resolving the crashes, although note that it is probably part one of two.
Project Member Comment 22 by bugdroid1@chromium.org, Nov 18
The following revision refers to this bug:
  https://skia.googlesource.com/skia.git/+/c89e2438ae24c862087f8ca1cb076052edbb27ea

commit c89e2438ae24c862087f8ca1cb076052edbb27ea
Author: Mike Klein <mtklein@chromium.org>
Date: Fri Nov 18 17:55:21 2016

Turn off /arch:AVX[2] on Windows builds.

This canaries a similar change Chrome may need.

BUG= chromium:666707 

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue= 5005 

Change-Id: Ibf7f9941968d905d865b9be1e63ebbf768870175
Reviewed-on: https://skia-review.googlesource.com/5005
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>

[modify] https://crrev.com/c89e2438ae24c862087f8ca1cb076052edbb27ea/BUILD.gn
[modify] https://crrev.com/c89e2438ae24c862087f8ca1cb076052edbb27ea/gn/BUILD.gn

Just so we're all clear, the change landed in #22 doesn't affect Chrome.  That GN build is completely separate from Chrome's.  I just wanted to land this first to test the effect on Skia-in-isolation before we change Skia-in-Chrome.

If that seems sane on our tree, we will want to make an analogous change to src/skia/BUILD.gn.

I imagine we want to remove all /arch:AVX and /arch:AVX2 flags in Chrome, not just Skia's right?  Are there others beyond libvpx?  Do you know if they can function without /arch:AVX?
Agreed. with analysis on revert. The root cause problem here is the use of an AVX instruction on a non AVX capable CPU.

mtklein: What's the typical timeframe for Skia to know this change is okay?

skia, libvpx, and possibly pdfium (although that is just a skia\BUILD.gn file in pdfium and is probably not relevant).
Probably an hour or so, but I can start the Chromium CQ in parallel.

Right, I think the pdfium you're seeing is for its standalone build, not as built by Chrome.
Cc: robliao@chromium.org
Owner: mtklein@chromium.org
Sounds good. I wonder if there will be a non-trivial performance impact to removing the AVX flags (not that we can do anything about it in one day).

Reassigning to mtklein in the meantime since he's doing the legwork already.
Yeah, very likely there's a performance impact here, but worst case we can go back to maxing out at SSE4.1 on Windows and just use AVX+ on Mac, Linux, etc.  It's sort of moot to worry about speed if it's gonna crash.

clang-cl may or may not have this problem (depending on the exact libraries it uses), so anything we undo here we can revisit if we start seriously using Clang on Windows.
The Skia tree's looking pretty happy (https://status.skia.org/?commitLabel=author&commit_label=author&filter=search&search_value=Win) so I'm going to let https://codereview.chromium.org/2518473003 land.

Bruce has gamely volunteered to handle the equivalent changes for libvpx.
Cc: riajiang@chromium.org
Issue 666708 has been merged into this issue.
A change to disable /arch:AVX and /arch:AVX2 in libvpx\BUILD.gn is going through the try bots - crrev.com/2516703002. A quick glance suggests that most of the code uses intrinsics so we may not lose much performance. On the other hand, I think that switching between AVX and SSE without vzeroupper/vzeroall can harm performance, so it's hard to know what will happen.

https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx

Yeah, the SSE <-> AVX transition is what had me suspecting we'll see a perf hit from this in Skia.  We've seen this throttle performance to a degree that we had to entirely redesign the system we're building to stay in AVX all the time, even if it otherwise would be possible to call some shared SSE code instead.

To be fair, it's a whole lot worse if you _don't_ vzerouppper...  Apparently this is not so bad since Skylake, but I don't have a machine to confirm that.

I think I'm gonna have to link to crrev.com/2516703002 if this ever comes up again... that describes the problem as I understand it perfectly.
The really fun thing here is, I don't think ceilf() ever has to be called from the AVX code in order for it to break non-AVX code.

All we need is the AVX compiled source to include the math.h with the non-static ceilf()... at that point it's the same as if the AVX source file defined ceilf() itself.  Whether it's otherwise used by code in that file is moot, right?
With the VC++ toolchain that concern (ceilf being emitted even if not called) was real, but with VC++ 2013 and beyond as long as you specify "/Zc:inline" (which we do) then unreferenced COMDATs (functions, for our purposes here) are discarded from .obj files.
Ah, good.  I read "Removes unreferenced functions or data that are COMDATs or only have internal linkage." and thought, "okay, these are not internal linkage... I wonder what a COMDAT is".
Ah, so, we may need to figure out clang-cl now instead of later.

https://codereview.chromium.org/2518473003

clang-cl wants /arch:AVX or -mavx to use intrinsics.  Any idea which math.h it uses?  MSVC's?  Its own?
Cc: tha...@chromium.org
Adding thakis@ for clang-cl perspective.

Temporary fix would be to pass /arch:AVX when is_clang == true. We may get no crashes, or we may get crashes when clang-cl binaries are run on non-AVX machines. Either way it's a huge improvement.

I am fairly certain that clang-cl uses Microsoft's header files, including math.h. But the lack of LTCG and other differences means that the bug appearing in VC++ does not imply that it will appear in clang-cl builds.
I just used /showIncludes on a clang-cl build to confirm that math.h comes from Microsoft in clang-cl builds:

including file: c:\src\depot_tools\win_toolchain\vs_files\d5dc33b15d1b2c086f2f6632e2fd15882f80dbd3\win_sdk\bin\..\..\win_sdk\Include\10.0.10586.0\ucrt\math.h

Yeah, so whether or not the bug bites us, it's still there with clang-cl.

I'll update my CL to turn AVX/AVX2 back on when is_clang, but I'm not looking forward to the conversation we're going to have a few months.  Seems like the only way to work around this problem with clang-cl is to use a different libc or to turn off AVX entirely.
Agreed.
Cc: r...@chromium.org
clang-cl uses msvc's headers for most stuff; it has a handful of built-in headers (https://github.com/llvm-mirror/clang/tree/master/lib/Headers – many of them do some stuff and then forward to the ms headers though).

Can someone summarize what you want to do? What exactly is the clang-cl question here? Passing different flags for cl and clang-cl is in general a bad idea.

re inline functions: This is different in C and C++. If your header should be used from c and c++ files I think you do need to make your functions inline static; I believe C++ guarantees that inline functions are identical in all TUs. I'm not sure how this is supposed to interact with different /arch settings if a .h is built in different TUs. Maybe this means inline functions must not take advantage of /arch flags?
From reading the bug, it sounds like skia has some files that use avx intrinsics. You build those with /arch:AVX, and then the compiler figures it can use that for some function from math.h, and then the linker picks the AVX'd function from math.h when it merges inline functions.

clang-cl requires /arch:avx to be able to use avx intrinsics.

Hm. Not sure what the best fix is, but to get things not crashy again, I'd suggest not building the avx parts of skia for now (I gather they're new, else we would've had this crash for longer), and then figure out what to do.
That's exactly the problem we're seeing here.  The same non-static inline ceilf() from urcrt\math.h is compiled differently in two translation units, one with VEX-encoded vmovss and one with non-VEX-encoded movss.  We link those together and get one of the two arbitrarily.  Things crash on older machines if we happened to pick the VEX encoded one.

I think the distinction here is, C++ doesn't guarantee that inline functions are identical in all TUs... it requires it.  It's very easy to violate that requirement by compiling an inline function with two different /arch settings.  It's not hard, though, to do the same with the preprocessor, e.g. checking #ifdef NDEBUG.
Well, if you have different definitions of an inline in two TUs that's an ODR violation. What's interesting here is that the different definition arises because of a compiler flag, and because the function is in a system header...
Yeah, exactly.

Ordinarily these things are my fault and I can fix them directly with more static or an anonymous namespace... in this case, I claim the bug is that ceilf() is defined inline in ucrt\math.h but not marked static.  I don't know how to fix that.
Yep, ODR violation is the basic problem.

The question is *whose fault is it*? Is it our fault for compiling ceilf with two different /arch: settings? Or Microsoft's fault for not marking ceilf as static?

Given the definition of ceilf the /arch:AVX flag seems almost useless.

Some of Microsoft's thoughts on this are here:
https://connect.microsoft.com/VisualStudio/feedback/details/3103806

If the clang-cl concern about was wrt /Zc:inline: Are you sure clang-cl doesn't discard unreferenced comdats? I wouldn't thought we do (and if not, that seems like a bug?)
https://connect.microsoft.com/VisualStudio/feedback/details/3103806 doesn't mention _ceilf and that this happens in system headers, so it seems somewhat tangential.

To me it feels like a MS toolchain bug if compiling to distinct TUs (with distinct contents) with different /arch settings and linking them together breaks your program. But given that system headers currently contain non-static inline functions, I guess it's not currently safe to use different /arch flags on different TUs if those TUs include system headers.
The basic clang-cl concern is, unlike MSVC, we have no way to express "we'll use AVX intrinsics, but you compiler please stay away from AVX" as a sort of halfway fix.  Under that strategy, ceilf() would be compiled always using the same settings, and no ODR violation happens, but we can still use AVX in files that are specifically guarded by namespaces and runtime CPU checks.

With clang-cl, we either pass -mavx, and both we and the compiler can use AVX, or not, and neither can.
namespace {
#include <math.h>
}

:-/
That'll break anything that's _not_ inline, won't it?
Ah, yeah.

#define __inline static __inline

?
Yikes.  Nothing here is sounding particularly robust.  I think I'd better mention the alternative we're all thinking but no one is nuts enough to really suggest, which is to rewrite all the Windows libc headers.  #define __inline static __inline is kind of a dynamic version of that that works when you remember to do it...

I think what I'd like to do for now is disable our Skia's AVX parts on Windows with some stubs (CL incoming shortly) to buy me some time to breathe and ponder.
I think the Real Fix (tm) is to report that to Microsoft and have them add static to their inlines in their headers.

Then again, `find ~/src/depot_tools/win_toolchain/vs_files/d5dc33b15d1b2c086f2f6632e2fd15882f80dbd3/ -name '*.h' | xargs grep inline | grep -v static` prints 5837 lines that are in need of auditing.

I agree that none of the suggestions are very robust. Another non-robust thing from the msconnect link Bruce linked to: It sounds like the linker picks the inline function it finds first, so making sure a sse-built TU including <math.h> is in front of the avx-built ones on the link command might also work.


As said upbug, it sounds like the most robust way to do this is to not include system headers in the files that want to use avx.
Agreed on all points.

I'll start looking at avoiding system headers Monday.
Labels: -Restrict-View-Google
Project Member Comment 57 by sheriffbot@chromium.org, Nov 18
Labels: FoundIn-M-56 Fracas
Users experienced this crash on the following builds:

Win Canary 56.0.2924.0 -  21.92 CPM, 88 reports, 79 clients (signature skia::`anonymous namespace'::ResizeFilter::ComputeFilters)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 58 by sheriffbot@chromium.org, Nov 18
Labels: FoundIn-M-56 Fracas
Users experienced this crash on the following builds:

Win Canary 56.0.2924.0 -  491.54 CPM, 1973 reports, 1124 clients (signature ThreadWatcherList::ParseCommandLineCrashOnHangThreads)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 59 by sheriffbot@chromium.org, Nov 18
Users experienced this crash on the following builds:

Win Canary 56.0.2924.0 -  57.05 CPM, 229 reports, 142 clients (signature gfx::ScaleToEnclosingRect)
Win Canary 56.0.2924.0 -  12.46 CPM, 50 reports, 35 clients (signature `anonymous namespace'::ScaleOffset)
Win Canary 56.0.2924.0 -  117.84 CPM, 473 reports, 280 clients (signature gfx::ToRoundedInt)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 60 by bugdroid1@chromium.org, Nov 18
The following revision refers to this bug:
  https://skia.googlesource.com/skia.git/+/17b6e487b71b7ed541d3239f194e121fbe8efd1d

commit 17b6e487b71b7ed541d3239f194e121fbe8efd1d
Author: Mike Klein <mtklein@chromium.org>
Date: Fri Nov 18 22:11:41 2016

Revert "Turn off /arch:AVX[2] on Windows builds."

This reverts commit c89e2438ae24c862087f8ca1cb076052edbb27ea.

Reason for revert: 

I'm going to stub this code out in Chrome instead for now.  Chrome's not going to land something that looks like this, so I'd rather undo it than leave Skia in this odd state.

Original change's description:
> Turn off /arch:AVX[2] on Windows builds.
> 
> This canaries a similar change Chrome may need.
> 
> BUG= chromium:666707 
> 
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue= 5005 
> 
> Change-Id: Ibf7f9941968d905d865b9be1e63ebbf768870175
> Reviewed-on: https://skia-review.googlesource.com/5005
> Reviewed-by: Mike Klein <mtklein@chromium.org>
> Commit-Queue: Mike Klein <mtklein@chromium.org>
> 

TBR=mtklein@chromium.org,brucedawson@chromium.org,reviews@skia.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Change-Id: I9562f7877749c61f6ba4d48d6c4b557f09876128
Reviewed-on: https://skia-review.googlesource.com/5069
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>

[modify] https://crrev.com/17b6e487b71b7ed541d3239f194e121fbe8efd1d/BUILD.gn
[modify] https://crrev.com/17b6e487b71b7ed541d3239f194e121fbe8efd1d/gn/BUILD.gn

Yeah, subtarget flags are kind of a huge problem for the C++ compilation model. Going forward, I think we'll want to add and use attributes to enable subtarget features without enabling them in headers. The usage model would look something like:

// foo.cpp
#include "foo.h"
#include "bar.h"
#include "immintrin.h"
#pragma enable_awesome_avx512_features
void doit() { ... }

// foo.h
#include "bar.h"
#include "immintrin.h"
#pragma push enable_awesome_avx512_features
inline void dothat() { ... }
inline void dothose() { ... }
#pragma pop enable_awesome_avx512_features

I don't know if the VC folks have any plans to add these kinds of features, but this is definitely something we want to do in clang. It's already sort of available with __attribute__((target)).
Project Member Comment 62 by bugdroid1@chromium.org, Nov 19
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2ea76e2274404af7e53cc4bb5d212b22856ee455

commit 2ea76e2274404af7e53cc4bb5d212b22856ee455
Author: mtklein <mtklein@chromium.org>
Date: Sat Nov 19 07:08:59 2016

Stub out Skia AVX and HSW opts.

This is hopefully temporary while we mull better options.

BUG= 666707 

Review-Url: https://codereview.chromium.org/2512093003
Cr-Commit-Position: refs/heads/master@{#433407}

[modify] https://crrev.com/2ea76e2274404af7e53cc4bb5d212b22856ee455/skia/BUILD.gn
[add] https://crrev.com/2ea76e2274404af7e53cc4bb5d212b22856ee455/skia/ext/SkOpts_avx_stub.cc
[add] https://crrev.com/2ea76e2274404af7e53cc4bb5d212b22856ee455/skia/ext/SkOpts_hsw_stub.cc

Just to confirm that this bug seems to be fixed (once again ref:  Issue 654213 ). Update Version 57.0.2925.0 canary (64-bit) on Win 67 64 Bit with Intel DX58SO2 MB + I7 990-x (see CPU information in attachments.
Att1.jpg
243 KB View Download
Att2.jpg
236 KB View Download
Project Member Comment 64 by sheriffbot@chromium.org, Nov 19
Labels: FoundIn-M-57
Users experienced this crash on the following builds:

Win Canary 57.0.2925.0 -  21.29 CPM, 98 reports, 86 clients (signature skia::`anonymous namespace'::ResizeFilter::ComputeFilters)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 65 by sheriffbot@chromium.org, Nov 19
Users experienced this crash on the following builds:

Win Canary 57.0.2925.0 -  380.53 CPM, 1752 reports, 1050 clients (signature ThreadWatcherList::ParseCommandLineCrashOnHangThreads)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 66 by sheriffbot@chromium.org, Nov 19
Users experienced this crash on the following builds:

Win Canary 57.0.2925.0 -  92.96 CPM, 428 reports, 267 clients (signature gfx::ToRoundedInt)
Win Canary 57.0.2925.0 -  7.82 CPM, 36 reports, 28 clients (signature `anonymous namespace'::ScaleOffset)
Win Canary 57.0.2925.0 -  46.70 CPM, 215 reports, 113 clients (signature gfx::ScaleToEnclosingRect)

If this update was incorrect, please add "Fracas-Wrong" label to prevent future updates.

- Go/Fracas
Project Member Comment 67 by bugdroid1@chromium.org, Nov 20
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8aa53c9b50de7f662697e64c1fe590d59ce6edcc

commit 8aa53c9b50de7f662697e64c1fe590d59ce6edcc
Author: skia-deps-roller <skia-deps-roller@chromium.org>
Date: Sun Nov 20 16:11:06 2016

Roll src/third_party/skia/ eaef61537..792d0f13d (18 commits).

https://skia.googlesource.com/skia.git/+log/eaef615377bc..792d0f13d6cb

$ git log eaef61537..792d0f13d --date=short --no-merges --format='%ad %ae %s'
2016-11-20 egdaniel Revert "switched skslc from std::string to SkString"
2016-11-20 egdaniel Revert "fixed iOS build failure"
2016-11-18 krasin Avoid unnecessary cast on a garbage data.
2016-11-18 mar.kazmierczak Fix typo in GrGLCaps
2016-11-18 reed android does not need XFERMODE_PUBLIC flag
2016-11-18 mtklein Revert "Turn off /arch:AVX[2] on Windows builds."
2016-11-18 mtklein mirror tiling
2016-11-18 bsalomon Make GrSwizzle::GrSwizzle() constexpr
2016-11-18 bsalomon Remove unnecessary attribute and varying type modifiers
2016-11-18 mtklein Build fiddle and public_headers_warnings_check only when skia_enable_tools.
2016-11-18 brianosman VS SLN script: Automatically determine which folder to use/copy
2016-11-18 mtklein Turn off /arch:AVX[2] on Windows builds.
2016-11-18 mtklein update G3 build after crrev.com/2500113004
2016-11-18 bsalomon Make GrBufferAccess a nested class of GrProcessor
2016-11-18 liyuqian Add test for QuickFDot6Div
2016-11-18 ethannicholas fixed iOS build failure
2016-11-17 ethannicholas switched skslc from std::string to SkString
2016-11-18 robertphillips Add handling for instantiate failure up the call stack

BUG= 666707 , 665681 , 665500 , 665621 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
TBR=egdaniel@google.com

Review-Url: https://codereview.chromium.org/2516183002
Cr-Commit-Position: refs/heads/master@{#433454}

[modify] https://crrev.com/8aa53c9b50de7f662697e64c1fe590d59ce6edcc/DEPS

You can ignore #67.  Nothing interesting there except me putting things back to Full Dangerous for local Skia testing, to help me track down the problem outside Chrome.

The "Stub out ..." CL from #62 should be making its way out in 57.0.2926.0.  So we should expect to see crashes on builds cut before that.
Cc: -fdoray@chromium.org
Issue 666934 has been merged into this issue.
Labels: Merge-Approved-56
Can we merge the fixes into M56 (build 2924)?  It looks like the crash is no longer happening in 57.0.2926.0, which goes along with #68.

This is currently blocking the Dev release.
Cc: ligim...@chromium.org
Project Member Comment 73 by bugdroid1@chromium.org, Nov 22
Labels: -merge-approved-56 merge-merged-2924
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bc63dc10b5972528108c2a5eafde9f2919e1a087

commit bc63dc10b5972528108c2a5eafde9f2919e1a087
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Tue Nov 22 00:45:00 2016

Stub out Skia AVX and HSW opts.

This is hopefully temporary while we mull better options.

BUG= 666707 

Review-Url: https://codereview.chromium.org/2512093003
Cr-Commit-Position: refs/heads/master@{#433407}
(cherry picked from commit 2ea76e2274404af7e53cc4bb5d212b22856ee455)

Review URL: https://codereview.chromium.org/2519083004 .

Cr-Commit-Position: refs/branch-heads/2924@{#47}
Cr-Branched-From: 3a87aecc31cd1ffe751dd72c04e5a96a1fc8108a-refs/heads/master@{#433059}

[modify] https://crrev.com/bc63dc10b5972528108c2a5eafde9f2919e1a087/skia/BUILD.gn
[add] https://crrev.com/bc63dc10b5972528108c2a5eafde9f2919e1a087/skia/ext/SkOpts_avx_stub.cc
[add] https://crrev.com/bc63dc10b5972528108c2a5eafde9f2919e1a087/skia/ext/SkOpts_hsw_stub.cc

Labels: -ReleaseBlock-Dev
The CL is merged to M56,  removing the blocker label.We will verify once the build is deployed to Dev channel.
Project Member Comment 75 by bugdroid1@chromium.org, Nov 28
Project Member Comment 76 by bugdroid1@chromium.org, Nov 29
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b4ee4bffa0c144adfb16968ca3645bc624e938db

commit b4ee4bffa0c144adfb16968ca3645bc624e938db
Author: skia-deps-roller <skia-deps-roller@chromium.org>
Date: Tue Nov 29 04:21:16 2016

Roll src/third_party/skia/ d5de01364..99ab92b59 (22 commits).

https://skia.googlesource.com/skia.git/+log/d5de01364378..99ab92b5958a

$ git log d5de01364..99ab92b59 --date=short --no-merges --format='%ad %ae %s'
2016-11-28 raftias Moved A2B0 profile parsing before XYZ
2016-11-28 mtklein Consistent naming.
2016-11-28 reed use raster-pipeline in readPixels
2016-11-28 ethannicholas added support for layout(offset=...) to skslc
2016-11-28 benjaminwagner Merge changes from internal cl/140385880.
2016-11-28 mtklein simplify
2016-11-28 liyuqian Revert "Add the missing shift to the dy"
2016-11-28 mtklein Convert blitter over to new style from_srgb, to_srgb.
2016-11-28 lsalzman use __BYTE_ORDER__ macro to detect endianness when available
2016-11-28 brianosman Narrow the SkImageGenerator interface
2016-11-28 bsalomon Remove old driver bug workaround for glTexStorage.
2016-11-28 borenet Roll recipe DEPS
2016-11-28 ethannicholas unified ASTLayout/Layout and ASTModifiers/Modifiers
2016-11-28 ethannicholas removed textureProj() and legacy texture functions from sksl
2016-11-28 reed simplify SkConfig8888 logic: just fall-through if memcpy case isn't supported
2016-11-28 mtklein Split srgb out of accum stages.
2016-11-28 raftias Fuzzer fix for overflow in some Lut8 profiles.
2016-11-28 mtklein Fix unpremul stage.
2016-11-28 liyuqian Add the missing shift to the dy
2016-11-28 brianosman GrTextureProducer cleanup, phase two: Producer, Adjuster, Maker
2016-11-22 mtklein Guard against buggy ucrt\math.h.
2016-11-22 ethannicholas baked in a few more precision modifiers

BUG= 668784 , 668907 , 666707 

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
TBR=brianosman@google.com

Review-Url: https://codereview.chromium.org/2532083003
Cr-Commit-Position: refs/heads/master@{#434889}

[modify] https://crrev.com/b4ee4bffa0c144adfb16968ca3645bc624e938db/DEPS

Project Member Comment 77 by bugdroid1@chromium.org, Nov 29
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0dcd8a3ef5f7cdcdd556aeeed1db09e86c3e5ae5

commit 0dcd8a3ef5f7cdcdd556aeeed1db09e86c3e5ae5
Author: mtklein <mtklein@chromium.org>
Date: Tue Nov 29 14:24:25 2016

Revert of Stub out Skia AVX and HSW opts. (patchset #1 id:1 of https://codereview.chromium.org/2512093003/ )

Reason for revert:
Now that we're guarding against ucrt\math.h in Skia and that's rolled into Chromium, it's as good a time as ever to try turning back on Skia's AVX+ code at head.  (I intend to just leave it stubbed out like this in M56.)

Original issue's description:
> Stub out Skia AVX and HSW opts.
>
> This is hopefully temporary while we mull better options.
>
> BUG= 666707 
>
> Committed: https://crrev.com/2ea76e2274404af7e53cc4bb5d212b22856ee455
> Cr-Commit-Position: refs/heads/master@{#433407}

TBR=brucedawson@chromium.org,thakis@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG= 666707 

Review-Url: https://codereview.chromium.org/2539453003
Cr-Commit-Position: refs/heads/master@{#434983}

[modify] https://crrev.com/0dcd8a3ef5f7cdcdd556aeeed1db09e86c3e5ae5/skia/BUILD.gn
[delete] https://crrev.com/cea883202f2e33a6b48b643b3d57221a2ef6187e/skia/ext/SkOpts_avx_stub.cc
[delete] https://crrev.com/cea883202f2e33a6b48b643b3d57221a2ef6187e/skia/ext/SkOpts_hsw_stub.cc

Looks like 57.0.2937.0 has the stub revert from #77.
Labels: -Stability-Sheriff-Desktop
It's probably too early to tell anything about 57.0.2937.0 which only has 120 crashes ATM.

It looks like this issue is well understood so I'm removing the sheriff label.

FWIW the original crash signature last appeared in 57.0.2925.0
and 56.0.2924.0.

I have been looking at illegal instruction crash counts by version and CPU [1]. There was only one in 57.0.2926.0 (against 1526 crashes total).

[1] https://crash.corp.google.com/dremel_query_ui?q=SELECT%20product.version%2C%20cpu.Info%2C%20COUNT(*)%0AFROM%20crash.prod.latest%0AWHERE%20product.name%3D%27Chrome%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27browser%27%20AND%20crash.Reason%3D%27EXCEPTION_ILLEGAL_INSTRUCTION%27%0AAND%20product.version%20%3C%20%276%27%20AND%20product.version%20%3E%20%2756%27%0AGROUP%20BY%201%2C%202%20ORDER%20BY%201%20DESC%2C%202
FWIW, template functions behave like inline functions, and you can have the same problem with (say) std::min(). Requiring those to be static is probably somewhere between unreasonably and non-standards-conformant :-(
Interesting.  I'd have thought std::min and std::max would be static too.  It's hard for me to imagine a good reason that anything, function or data, defined in a header is non-static, especially with ICF.

In any case, we typically don't use std::min or std::max, especially in this code.  There we're more likely to be using intrinsics like _mm_min_ps, _mm256_max_ps, etc.

I think I'll just give up on MSVC for good if those intrinsics ever start giving us ODR problems...
But std::min() would be a problem for non-MSVC too, right?
Yes, anything's a problem if it's defined in a header and not at least one of static, force-inlined, or in an anonymous namespace.

I may not have been clear about what I was going on about with intrinsics.  Clang's x86 intrinsics are all static and force-inlined, doubly safe.  GCC's are extern and force-inlined, which I think is safe but worrying.  Don't know about MSVC's.
Status: Fixed
Feels fixed right?  I'm not seeing anything worrying in crash for 2937+.
I filed https://connect.microsoft.com/VisualStudio/feedback/details/3114220 to request that the functions in math.h be marked as static.

Regarding comment #81, I would agree that std::min and std::max also need the treatment, and probably many other functions as well.

I would not expect intrinsics to be a problem because they aren't "really" functions.
Project Member Comment 86 by bugdroid1@chromium.org, Jan 12
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9f158899c5bec9e547cbcc49aff6f087620f6129

commit 9f158899c5bec9e547cbcc49aff6f087620f6129
Author: skia-deps-roller <skia-deps-roller@chromium.org>
Date: Thu Jan 12 16:45:36 2017

Roll src/third_party/skia/ 0c2997b6d..d32303e72 (2 commits).

https://skia.googlesource.com/skia.git/+log/0c2997b6d85e..d32303e7276b

$ git log 0c2997b6d..d32303e72 --date=short --no-merges --format='%ad %ae %s'
2017-01-10 mtklein disable runtime detected AVX2 raster pipelines
2017-01-12 djsollen Remove defunct include/images directory from GN.

BUG=679147, 654213 ,664864, 666707 ,etc.

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls

CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
TBR=kjlubick@google.com

Review-Url: https://codereview.chromium.org/2633483002
Cr-Commit-Position: refs/heads/master@{#443259}

[modify] https://crrev.com/9f158899c5bec9e547cbcc49aff6f087620f6129/DEPS

Microsoft has officially said that they will *not* be marking inline functions as static. Their closing comment was:

"The Universal CRT team has discussed this and will not be declaring the functions in math.h as static inline. The C standard is clear that all library functions should have external linkage.

In order to solve this scenario, we recommend using a separate binary for /arch:AVX compiled code."

https://connect.microsoft.com/VisualStudio/feedback/details/3114220

FWIW
Bummer.
Sign in to add a comment