Issue metadata
Sign in to add a comment
|
run x86 unittests through QEMU too to handle board/build ISA differences |
||||||||||||||||||||||
Issue descriptionThe illegal instruction error is found in https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1066977 where the unittest phase failed for board Grunt. The same set of unittests passed with board Eve. Grunt uses AMD Stoney Ridge (arch=amdfam10) while Eve uses Intel Kaby Lake (arch=corei7). Here is the illegal instruction: => 0x00005555555657f9 <+825>: insertq $0x8,$0x8,%xmm1,%xmm2 Repro step: FEATURES=test emerge-grunt newblue
,
Jun 26 2018
> like turning off cmov for alex/zgb sorry, i meant movbe, not cmov. but the overall issue is the same.
,
Jun 26 2018
if we need something for M69, turning off unittests for grunt is one option. not sure if CI can shuffle bots so that grunt/etc... boards only build on amd64 bots.
,
Jun 26 2018
This is a ss4a instruction. vapier@ should be didable this for amd boards ( -mno-sse4a)
,
Jun 26 2018
And enable sse4.2 instead?
,
Jun 26 2018
to be clear, i'm not suggesting we keep going down this path of "let's turn off all the ISAs our bots don't support". i think the movbe approach was a mistake.
,
Jun 26 2018
Ok, so unassigning me and assigning to jclinton.
,
Jun 26 2018
As a short term solution, turning off the affected unit test on grunt doesn't seem too bad -- assuming that the unit test isn't ISA specific I'd agree with vapier that it isn't an ideal solution to turn off certain optimizations just because our bots can't run the affected unit tests. We could either run the unit tests on actual devices as part of HW tests (or in a similar setup), or run the unit tests under qemu (similar to what we've done for arm) by extending the qemu support in platform2_test.
,
Jun 26 2018
we could hack newblue to check CFLAGS for "amd" and skip running tests altogether. are we enabling newblue in other boards so we don't lose unittest coverage entirely ?
,
Jun 26 2018
Re #9: how about adding a "$(get-flag march) != amd*" check, in additional to the "use x86/amd64" check in the ebuild?
,
Jun 26 2018
that'd work to unblock the BT work (assuming unittest coverage is still provided by other boards). we're also betting that only the BT project currently (for whatever reason) is hitting this scenario.
,
Jun 27 2018
,
Jul 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/44946a0885a4e24b611e757654183b07b8a19a8d commit 44946a0885a4e24b611e757654183b07b8a19a8d Author: Sonny Sasaka <sonnysasaka@chromium.org> Date: Sun Jul 01 00:21:30 2018 newblue: Skip tests for AMD CPUs. SSE4a optimization causes tests to not run properly on Intel bots. BUG=chromium:856686 TEST=This should skip test: FEATURES=test emerge-grunt newblue Change-Id: If7fc56752ec32f2bcf42f6a3545289d2683a2601 Reviewed-on: https://chromium-review.googlesource.com/1121424 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Sonny Sasaka <sonnysasaka@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/44946a0885a4e24b611e757654183b07b8a19a8d/chromeos-base/newblue/newblue-9999.ebuild
,
Jul 18
Build Team has been discussing ways that we could support non-host-CPU-target unittest execution but this currently a feature request, not a bug. (This applies to any non-Intel CPU architecture, not just AMD.)
,
Jul 18
mmm that's not really true we support running unittests on native CPUs today and are actively using it for ARM. you can see it being used on our ARM builders now: https://logs.chromium.org/v/?s=chromiumos%2Fbb%2Fchromiumos%2Fdaisy-incremental%2F43888%2F%2B%2Frecipes%2Fsteps%2FUnitTest%2F0%2Fstdout it's just that the logic currently is only based on the architecture, not the SoC. so chromite.lib.qemu.DetectArch sees that the board is x86_64 and the build system is x86_64, so it says "don't use qemu" at which point we execute everything directly. this is for obvious speed reasons.
,
Jul 18
I am not fully clear convinced the value of the unittest executions. They make the server/build slower. They don't run on the DUT. If they were to find a problem they would break the build (instead of having a valid build and then marking is as suspect at a later stage). Do we have examples where the ARM unittests caught something the x86 unittests did not? And if they showed an architecture specific result, why would we not want them to run on DUT instead where they could test what we ship? And if it is not that architecture specific, could we just have a single builder that executes the unittests representing all the boards (and make all other builds faster)?
,
Jul 18
> They make the server/build slower. i don't get this. any unittests are going to, by definition, make the build slower. > They don't run on the DUT. that's entirely by design. we don't have DUT resources to run in the PreCQ for every board. > If they were to find a problem they would break the build err, that's the entire point of unittests ? > Do we have examples where the ARM unittests caught something the x86 unittests did not? off hand, no, but i do recall it coming up in the past. at this point, we don't track unittest failures anymore because they largely occur in the PreCQ and devs triage/fix their code first. we do have ARM-only boards which have unique packages (e.g. jetstream), so by definition, yes it wouldn't have been caught for x86. > And if they showed an architecture specific result, why would we not want them to run on DUT instead where they could test what we ship? we don't have the lab resources. if you fix that, we can figure out a way to make it happen. ironically, i'm not entirely convinced we'd see a huge speed-up overall when DUT interaction today is not fast. > And if it is not that architecture specific, could we just have a single builder that executes the unittests representing all the boards (and make all other builds faster)? our generic builders try to build a lot of things, but at the end of the day, no. CrOS partner projects do not mix in for example.
,
Jul 18
,
Jul 18
> > If they were to find a problem they would break the build > err, that's the entire point of unittests ? In my view a world in which building and testing are separate stages is better than what we have (mixing tests into each package, depending one package on the next). The current approach where a simple failure blows up the entire build and prevents further testing of other packages does not scale well. Why should a package not get built/tested at all, just because another package was incidentally scheduled earlier and failed? Ideally there should be a building loop (builders churning out binaries as fast as they can) and a testing loop (infra testing as fast as they can, but presumably slower than the builders). > ironically, i'm not entirely convinced we'd see a huge speed-up overall when DUT interaction today is not fast. As I wrote, the build would be faster, as it would not include testing. Testing would be a fully independent issue (from build) to tackle. And yes, we do have lab resources. They are just poorly used due to software complexity issues.
,
Jul 18
your description doesn't seem to match what the bots are doing today we have a dedicated BuildPackages stage. this stage builds packages and that's it. it does not run any tests. only once BuildPackages finishes do we kick off the UnitTest stage. thus it's impossible for any unittest failure to break the building of all the packages. if you're talking about any test failure halting the overall bot execution, unittests is not unique in this regard. plus, it's what we've historically decided/designed for: if unittests fail, then the build is broken, so don't waste precious resources like DUTs. in the future world where there are bots churning out builds and other bots running tests, our current stage design already fits nicely into that model. but i don't see how future CI architecture decisions has any relevance to the question of running unittests. we absolutely don't have the DUT capacity today, and it's going to be a while in the future where we're at the point to even consider this. so until then, we need something now. plus, i'm not confident we'll ever be at the point where every build has DUT resources. we often develop hw & sw in parallel, and partners built off of CrOS frequently never have HW tests.
,
Aug 30
,
Nov 26
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/8b16fc480c877d0073f66c727e9ff127b02b294a commit 8b16fc480c877d0073f66c727e9ff127b02b294a Author: Chris McDonald <cjmcdonald@chromium.org> Date: Mon Nov 26 18:00:20 2018 chromeos_config: Add template for unit test only buildbot BUG=chromium:884613, chromium:856686 TEST=None Change-Id: I537f2586bc34c9e06ea99b6f841e250a358f4721 Reviewed-on: https://chromium-review.googlesource.com/1333835 Commit-Ready: Chris McDonald <cjmcdonald@chromium.org> Tested-by: Chris McDonald <cjmcdonald@chromium.org> Reviewed-by: Chris McDonald <cjmcdonald@chromium.org> [modify] https://crrev.com/8b16fc480c877d0073f66c727e9ff127b02b294a/config/chromeos_config.py [modify] https://crrev.com/8b16fc480c877d0073f66c727e9ff127b02b294a/config/waterfall_layout_dump.txt [modify] https://crrev.com/8b16fc480c877d0073f66c727e9ff127b02b294a/config/config_dump.json
,
Dec 17
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/869f9d7dc1c7b3870be26cde790f10b193aabb2b commit 869f9d7dc1c7b3870be26cde790f10b193aabb2b Author: Chris McDonald <cjmcdonald@chromium.org> Date: Mon Dec 17 18:14:33 2018 chromeos_config: Disable unit tests in grunt build BUG=chromium:856686, chromium:884613 TEST=None CQ-DEPEND=CL:1337441 Change-Id: I90d5422d0d644b0c165692ec704f7179b6e4df8a Reviewed-on: https://chromium-review.googlesource.com/1333836 Commit-Ready: Chris McDonald <cjmcdonald@chromium.org> Tested-by: Chris McDonald <cjmcdonald@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/869f9d7dc1c7b3870be26cde790f10b193aabb2b/config/chromeos_config.py [modify] https://crrev.com/869f9d7dc1c7b3870be26cde790f10b193aabb2b/config/config_dump.json |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by vapier@chromium.org
, Jun 26 2018