New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 659157 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug


Participants' hotlists:
speed-ops-high-priority


Sign in to add a comment

mac_chromium_rel_ng should use Mac10.10 or Mac10.11

Project Member Reported by nedngu...@google.com, Oct 25 2016

Issue description

I found that for catapult roll like https://codereview.chromium.org/2453433002, telemetry_perf_unittest is not triggered in mac_chromium_rel_ng (https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/321763)

This is a very serious loss of coverage, Emily or Dave, can either of you take a look?
 

Comment 1 by eyaich@chromium.org, Oct 25 2016

So that is odd, doesn't seem like it has triggered that test suite in a while.  It is included in the generated json: 

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/321763/steps/read%20test%20spec%20%28chromium.mac.json%29/logs/json.output

Maybe I don't understand how the tryservers work, do they automatically prune the tests that need to be run based on the directories in the suite of changes?  Maybe the logic there is invalid?

I will continue investigating.
Owner: eyaich@chromium.org
Status: Assigned (was: Untriaged)
Thanks Emily!

Comment 3 by eyaich@chromium.org, Oct 25 2016

As Ned pointed out from the recent catapult roll (https://catapult-roll.skia.org/) it is not being pruned out on linux or win, just mac: 

https://codereview.chromium.org/2453433002

Therefore I am not sure if there is different pruning logic per platform, but seems unlikely to be related to that.  

I am currently trying to compare steps in the recipes for this run on linux vs mac to see what the difference is between one triggering it and one not.

Comment 4 by kbr@chromium.org, Oct 25 2016

Cc: dpranke@chromium.org
Components: Build
Labels: Build-Tools-GN
The tryservers decide which tests to run based on which build targets are affected by the CL. It sounds like there might be a bug. The analysis was rewritten for GN recently by dpranke@ who can probably help.

Comment 5 by eyaich@chromium.org, Oct 25 2016

As far as I can tell from the builder steps everything looks the same right up until the tests are triggered.  Even at the last step, "isolate_tests" the isolate is listed with its hash: 

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/321763/steps/isolate%20tests/logs/json.output
 and the swarming targets file lists telemetry_perf_unittests: 

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/321763/steps/generate_build_files%20%28with%20patch%29/logs/swarming-targets-file.txt

Maybe Dirk can offer more insight on where to start debugging it on the GN side.
Cc: iannucci@chromium.org mar...@chromium.org martiniss@chromium.org phajdan.jr@chromium.org
Labels: -Build-Tools-GN
I don't think this is a GN/analyze issue. Catapult changes are whitelisted in //testing/buildbot/trybot_analyze_config.json and so analyze is skipped (and we require the recipe to build and test everything). And, you can see that telemetry_perf_unittests_run is being built in the compile step.

I don't know what's going on in isolate_tests and subsequent steps to cause the test to not be triggered. I don't know if there's something in the isolate_tests step or in the test-triggering steps that's going wrong, but I don't know that part of the code that well.

cc'ing some others that might have ideas.
Owner: martiniss@chromium.org
Status: Started (was: Assigned)
I'll investigate this. Going to try to reproduce this locally.

Comment 8 by eyaich@chromium.org, Oct 25 2016

This is very hard to diagnose since the builder page is not giving us anything to go in in previous steps to indicate a problem, the only symptom we have seen so far is that the test is not being triggered, even though it is present in the json and the target is being built.

Nothing has changed in the SwarmingIsoaltedSC=cript test in chromium_test/steps.py except some changes to how we handle the results (stuff I am working on for swarming the perf tests) but if there was an issue there we would see the job trigger and then see an error on processing the results. 

Have there been any recent changes to the swarming api that could have caused this? Or a way to tell when the last time this successfully ran on mac was?
According to dremel, the last time "telemetry_perf_unittests" was run on this builder was 1469465903000 unix milliseconds, which is about 7/25/2016, 10:35:25 AM GMT-7:00 DST. So quite a while ago........
Query I used is 

SELECT *
from chrome_infra.completed_steps WHERE 
builder = 'mac_chromium_rel_ng' AND
step_name = 'telemetry_perf_unittests'
ORDER BY build_number DESC
LIMIT 100
;
Cc: benhenry@chromium.org sullivan@chromium.org
Is there any movement on this bug? As of today, mac_chromium_rel_ng is still not running telemetry_perf_unittest: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/333625

This is a very important bug to fix, I just want to make sure that we don't drop it.
Ping, any update on this bug?
Owner: ----
Status: Available (was: Started)
I don't remember much about this bug. I can take a look this week maybe, but I'm doing a fixit so I don't know how much time I'll have to spend on this.
Summary: mac_chromium_rel_ng should use Mac10.10 or Mac10.11 (was: mac_chromium_rel_ng is not running telemetry_perf_unittest)
Err, silly me. I just check and find that mac_chromium_rel_ng use Mac10.9 & I removed support for running telemetry_perf_unittest on Mac10.9 in https://chromium.googlesource.com/chromium/src/+/4c9960c8c03e272829395c4b3d0910bc8524a1ae

However, as in Chromium policy that we don't support Mac older than the two latest version, I think the CQ should be updated to use Mac10.10 or Mac.11. Retitle this bug to reflect this.
We definitely still need to support 10.9 for now.

It is true that the bulk of the fleet should be on 10.11 (or, soon, 10.12), but 10.9 still needs to work.
To #15, my proposal is that we should move the CQ coverage to Mac10.10 & Mac10.11 because these align with our policy of support. I have no comment on whether we should keep Mac 10.9 on the main waterfall.
Agreed that we should move the CQ to 10.11 (and then 10.12 asap). I thought we had another bug on file for that already, but I'm not seeing it at the moment, so I'm fine w/ using this bug to track that.
Ping. Dirk - what's the plan?

Comment 19 by kbr@chromium.org, Feb 16 2017

I'll point out that the physical GPU bots have been running 10.12 for some time with no issues. Also, the Mac Minis in the Swarming pool are all running 10.12 as well -- these run both GPU tests and iOS tests.

I think it should be feasible to update the Mac VMs in the Swarming pool to 10.12 too.

I recently had all 10.10 osx bots on all perf waterfalls upgraded to 10.12.2, which is why there's new urgency here.
Owner: dpranke@chromium.org
Status: Assigned (was: Available)
Cc: erikc...@chromium.org
I think we'll probably move straight to 10.12 but we have a list of bugs to work out first (see  crbug.com/624049 ). So, I'm changing what I wrote on comment #17.

@erikchen - I know we're not ready to move everything to 10.12, but is are there reasons we can't move mac_chromium_rel_ng or the swarming pools to 10.12 now?

We could just dupe this into  bug 624049  (the rollup bug for upgrading to 10.12), but that would lose sight of the original reason this bug was filed, which is that telemetry_perf_unittests isn't running on the current builders, which are 10.9.

As to the nednguyen's comments in #14 and #17, they're just wrong. 10.9 is still supported and we've announced no EOL date for it yet. So, maybe support for 10.9 should be added back to the tests? I don't see why it was removed apart from some testing being noted as failing in  bug 630765  and nednguyen@ and sullivan@ deciding they didn't want to run perf tests on that platform. 

> @erikchen - I know we're not ready to move everything to 10.12, but is are there reasons we can't move mac_chromium_rel_ng or the swarming pools to 10.12 now?
mac_chromium_rel_ng currently runs webkit_tests. I believe that blink_tests do not yet pass on 10.12. Do we have a tracking bug for that? I can't find one.

swarming pools: Are you also referring to changing the trybot configuration so that the tests are run on 10.12 instead? We don't even have a main waterfall 10.12 bot yet so let's get that up first? Luckily, we already have an FYI bot running 10.12 [force mac toolchain: https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain] which runs most tests. There appears to only be a single flaky test - we've fixed the rest.

> As to the nednguyen's comments in #14 and #17, they're just wrong. 10.9 is still supported and we've announced no EOL date for it yet. So, maybe support for 10.9 should be added back to the tests? I don't see why it was removed apart from some testing being noted as failing in  bug 630765  and nednguyen@ and sullivan@ deciding they didn't want to run perf tests on that platform. 
I believe that the speed team only wants to support 2 macOS versions for performance testing. Obviously support for all macOS versions is better, but they are better aware of their capacity to deal with multiple macOS versions.
To answer my own question: 10.12 layout tests: https://bugs.chromium.org/p/chromium/issues/detail?id=697971


okay, well, "only support last two versions" isn't compatible with "most of the fleet is still on 10.9". Understandably that leads to the request to upgrade to 10.10 or 10.11, but that also impacts the rest of the team, so that isn't something the Perf team should decide by itself.
chatted with sullivan offline a bit. It sounds like I've taken us around in a circle here, sorry!

I will follow up with some people offline and figure out the plan.
Labels: OS-Mac
Status: Started (was: Assigned)
Owner: erikc...@chromium.org
Okay, I think we've concluded that we can just wait for 10.12, and then we'll roll this into the plan to upgrade everything to 10.12.

@erikchen, since you're owning the other 10.12-related bugs, I'll punt this one to you as well, but we can dedupe this or whatever as you see fit.
In Speed's defense, and since I was the reason we moved all of the 10.10 bots to 10.12...Speed wants to reflect the distribution of our users as closely as possible. There are more users on 10.12 and 10.11

https://uma.googleplex.com/timeline_v2?sid=0ba92a3664c4358c34a5bd71fc04090e
This is important to not just speed but for correctness testing of Chrome as well. IMO, the closer our test machines are to our users, the less risks we have in term of Security/Speed/Stablity.
I don't think there's any doubt that we should have more or even most of our bots running on 10.12. There's a reason we're emphasizing getting a lot better about timely upgrades in ops.

However, running most of your tests on a version that we're not testing on the waterfall just seems like asking for trouble to me. In the future I'd like to see us coordinate testing on version better across the whole fleet, rather than having subteams press ahead on their own.
Components: Infra>Client>Chrome
mac_chromium_rel_ng is now running LayoutTests on Swarming on 10.12

Is it just a process of making build only happen on 10.12 now?
Are we still running the non-layout-tests on 10.9?
Oh, never mind, I see your update to the other bug now. Then, yeah, probably just upgrading the builders is fine.
Cc: -iannucci@chromium.org iannu...@google.com
mac_chromium_rel_ng is currently on 10.13 w/ the exception of layout tests, which are on 10.12 and are going to be moved to 10.13 in https://bugs.chromium.org/p/chromium/issues/detail?id=853356

Can we close this out?
Status: Fixed (was: Started)
I think so.

Sign in to add a comment