New issue
Advanced search Search tips

Issue 917355 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug

Blocking:
issue 124610
issue 899438
issue 383340



Sign in to add a comment

openmax_dl doesn't use clang's integrated assembler

Project Member Reported by thakis@chromium.org, Dec 21

Issue description

Clang by default calls its internal assembler, but -fno-integrated-as is passed it shells out to gas.

We use the integrated assembler almost everywhere. It has a bunch of advantages:
- No additional process means perf is a bit better
- The assembler is updated with clang rolls and we can change it quickly

The one exception is openmax_dl in chrome/android/arm builds.

See https://chromium-review.googlesource.com/c/chromium/src/+/1388544/1 for an example of how the 2nd point currently costs us: I added a new assembler flag and I want to use it, but since openmax_dl still uses gas I now need to jump through a bunch of hoops (add the new flag to a config, add that config by default, remove it again in openmax_dl).

Making openmax_dl work with clang's integrated assembler is fairly mechanical but also a bit tedious: https://webrtc-review.googlesource.com/c/deps/third_party/openmax/+/57160 is a demo CL I made converting one of the ~200 assembly files. If you manage to do 6 CLs like this per day (an aggressive pace), converting them all will take 1.5 months.


Since it's a bit of a project, we should come up with some plan for this. openmax_dl owners, is this something you could spend a bunch of time on? If not, could we contract this out to an arm expert or something?

(...oh, looks like kma@ left the project? rtoy, you might want to find an openmax_dl co-owner.)


(spun off from issue 124610).

 
I could imagine a script that will take the defines' length, remove but keep in a dictionary and then, whenever found use of them in instructions, append the size modifier after verifying that all such defines lengths were the same for every instruction. The special case would be loads and stores that would only need the width, not the type info.

This should apply cleanly on every file if they're all like the three in the URL above and would be a two-day project.

Additional testing would be a longer time, of course, but not longer than with manual replacement.
Blocking: 383340 899438
Once this is done, we can undo most of https://webrtc-review.googlesource.com/c/deps/third_party/openmax/+/115440 and make https://chromium-review.googlesource.com/c/chromium/src/+/1388544 much simpler (see patch set 1).

This also blocks having fully build-dir-independent deterministic builds with symbols on Android.

Comment 3 by dbbrooks@chromium.org, Today (12 hours ago)

Status: Untriaged (was: Unconfirmed)
tmathmeyer@, just taking a stab at this one since it's "Unconfirmed" and under the Internals>Media component. Since it's marked as OS Android, could you confirm whether Videos Stack should keep an eye on this and remove/leave the Internals>Media component depending? Thanks!

Moving to Untriaged since this was created by a developer.

Comment 4 by dbbrooks@chromium.org, Today (12 hours ago)

Owner: tmathmeyer@chromium.org

Comment 5 by dalecur...@chromium.org, Today (12 hours ago)

Components: -Internals>Media Blink>WebAudio
Owner: ----
This should be in Blink>WebAudio.

Comment 6 by rtoy@chromium.org, Today (12 hours ago)

Step #1:  Chrome only uses the floating-point FFTs.  We should first stop building the integer FFTs with chrome. Other users might be using them, so I don't want to remove the code, but we can at least stop building them for Chrome.

As a side-effect, this reduces the size of Chrome Android apk by quite a bit.

Comment 7 by rtoy@chromium.org, Today (11 hours ago)

Also, ARM has a new open source library called Ne10 that has a C/C++ FFT library using intrinsics that is at as fast as this.  We could consider replacing this huge assembly library with Ne10.  Supports both arm32 and arm64.

I also considered using the open-source PFFFT library that is also equally fast, and supports X86, which SIMD for both ARM and x86. I think this only supports arm32 but not arm64.

If there's not a huge rush on getting this to work with clang, I can work on adding one of these libraries instead.

Comment 8 by thakis@chromium.org, Today (11 hours ago)

There's no rush, but seeing some progress within the next 6 months or so would be nice.

Sign in to add a comment