New issue
Advanced search Search tips

Issue 612288 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 673919
issue 771995
issue 777555



Sign in to add a comment

Script the disabling of fast_clz on ffmpeg windows

Project Member Reported by wolenetz@chromium.org, May 16 2016

Issue description

See https://chromium-review.googlesource.com/#/c/344671/ which did this manually, but that will be lost on the next ffmpeg roll.

This bug tracks implementing an automated solution for use during ffmpeg rolling. Perhaps one of:
1) run ffmpeg ./configure for windows in a VM with the oldest CPU we support (credit: thakis@chromium.org)
2) figure out a --cpu to pass to ffmpeg ./configure for windows which won't munge a bunch of other bits (credit: dalecurtis@chromium.org)
3) add a new munge script specifically for ia32 builds: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/ffmpeg/chromium/scripts/munge_config_optimizations.sh These scripts (we don't have any right now) are normally called in build_ffmpeg.py when generating the configuration options. The latter while slightly messier seems like it might be safer. (credit: dalecurtis@chromium.org)

This bug doesn't absolutely block the next FFmpeg roll, but ideally it would be fixed as part of it. Otherwise, each roll will need manual intervention to disable fast_clz on windows.
 
thakis@/dalecurtis@: Is this truly specific to just Windows? Or do linux/mac configs also need munging?
Note that https://chromium-review.googlesource.com/#/c/344671/ may actually fix this bug (following route #2 or #3). It's iterating in CR right now.

Comment 3 by thakis@chromium.org, May 16 2016

Since configure probes the local system, it probably applies to non-Windows too -- but the intrinsic setup on non-Windows until very recently was so that you could only use e.g. SSE intrinsics if you compiled your code with a --target cpu that supports SSE. On Windows, it's historically possible to use e.g. SSE intrinsics in files where the compiler isn't allowed to use SSE for the code it creates, and so it's easier to (accidentally and intentionally) mix-and-match SSE and non-SSE code in one TU. So it's probably more pressing on Windows.

https://chromium-review.googlesource.com/#/c/344671/ does (2) for now (but we can probably do better later).
Ok. I'll keep this bug open for possibly implementing (3) (for ia32 and x64 windows/linux/mac/etc) in the future. Once https://chromium-review.googlesource.com/#/c/344671 lands with implementation of (2), this bug will no longer block  bug 591845 .

Comment 5 by thakis@chromium.org, May 16 2016

Sounds good. Please let me know if the --cpu flag causes any issues. Happy to help in that case.
Blocking: -591845
Cc: wolenetz@chromium.org
Labels: -Pri-2 -M-52 Pri-3
Status: Available (was: Assigned)
@#4: It landed and rolled (https://chromium.googlesource.com/chromium/src/+/476f6c9166db498c42887fb78229b6de54f6e148), so this bug no longer blocks  bug 591845 .

Keeping open for possibly implementing (3) in future.
Blocking: 673919
Owner: hubbe@chromium.org
Status: Assigned (was: Available)
=> hubbe@ for potentially implementing (3) in the M-57 roll.
Owner: liber...@chromium.org
=> liberato's doing the M-57 roll instead.
Cc: liber...@chromium.org
Owner: ----
Status: available (was: Assigned)
Blocking: 771995
Owner: xhw...@chromium.org
Status: Assigned (was: Available)
xhwang@ - Is this something you could fix during your M65 roll? (If not, please remark as blocking the next ffmpeg roll beyond yours).
Cc: xhw...@chromium.org
Labels: M-65
Owner: sande...@chromium.org
Assign to sandersd@ since the ffmpeg rotation has shifted and now sandersd@ will work on the M65 roll.
Owner: dalecur...@chromium.org
Blockedon: 777555
ffmpeg uses runtime cpu detection for most things, so we're mostly just worried about this one edge case with the MSVC specific intrinsic. For this case does anyone remember why we didn't just check defined(__BMI__) ?
Seems no __BMI__ in MSVC, and even with clang -mavx2 doesn't seem to imply -mbmi; so that's a bit surprising. In any case, looking over the discussion on cfe-dev@ hans says that BMI isn't required for _tzcnt_*:

http://lists.llvm.org/pipermail/cfe-dev/2016-October/051331.html

...but that still isn't fixed on trunk it seems:

../../third_party/ffmpeg/libavcodec/h2645_parse.c(202,17):  error: implicit declaration of function '_tzcnt_u32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        size -= ff_ctz(v) + 1;

I think a simple patch to x86/intmath.h to only define these when !__clang__ would be upstreamable. Will send it and see.


Cc: h...@chromium.org
I don't remember details, but I think on our end we concluded that what ffmpeg is doing is really questionable and we shouldn't change the compiler. So if you're changing ffmpeg upstream, maybe you could make things just not behave weirdly. Hans, do you remember what that meant concretely?
Blockedon: -777555
Blocking: 777555
Happy to try if it's truly questionable, but we don't see any crashes in this code prior to your fix (and the number of opterons is non-zero), so I think hans@ is right:

https://crash.corp.google.com/browse?q=product.name%20CONTAINS%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.file_path%20CONTAINS%20%27ffmpeg%2Flibavutil%2Fmathematics.c%27%20AND%20product.Version%20CONTAINS%20%2751.0.%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D

Either we don't have any Opteron users or it would seem either _tzcnt_* maps correctly on non-BMI hardware with MSVC.
For historical context, hans@ originally landed http://llvm.org/viewvc/llvm-project?view=revision&revision=253358 based on this discussion on ffmpeg-devel:  https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html -- but I guess it got reverted at some point.

There was a later attempt by thakis@ as part of https://bugs.llvm.org/show_bug.cgi?id=30506, but it appears this never landed.

Per the e-mail it seems pretty clear ffmpeg is okay with the "rep bsf" encoding tzcnt degrades to (and answers my question about why we didn't have crashes).

Probably the best I can do from the ffmpeg side is to add a __clang__ #define if clang-cl isn't planning to match MSVC behavior here. 
Sent http://ffmpeg.org/pipermail/ffmpeg-devel/2017-October/218445.html upstream which I think is as much as I can get away with.
Per the discussion on https://reviews.llvm.org/D26335#inline-227758 , iirc the argument went the rep bsf doesn't check for 0, so if you really want tzcnt you need to check for 0 yourself first, and at that point it really doesn't buy you anything to use the fallback encoding. So it's not clear that they get any perf wins from what they're currently doing (and I think the commit message that added that code was like "might as well..." and not anything rigorous)
Upstream took my patch http://git.videolan.org/?p=ffmpeg.git;a=commit;h=50e30d9bb71e1dff27be16c264fac90e362b9896 so we can drop the cpu switch with the next ffmpeg roll.
Labels: -M-65 M-64
Status: Fixed (was: Assigned)
Fixed in latest roll. No more cpu=opteron and all builds are done with Clang, so we shouldn't get any bad flags again.

Sign in to add a comment