system_health.common_mobile failing with BattOr error on Android One |
||||||
Issue descriptionFirst failure appeared in: https://luci-milo.appspot.com/buildbot/chromium.perf/Android%20One%20Perf/533 The error is: Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded test.WillRunStory(state.platform) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 276, in WillRunStory platform.tracing_controller.StartTracing(self._tbm_options.config) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 43, in StartTracing self._tracing_controller_backend.StartTracing(tracing_config, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 91, in StartTracing started = agent.StartAgentTracing(config, timeout) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 73, in StartAgentTracing self._battor.StartTracing() File "/b/swarming/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 235, in StartTracing self._SendBattOrCommand(self._START_TRACING_CMD) File "/b/swarming/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 359, in _SendBattOrCommand 'Outputted: %s' % (cmd, status)) BattOrError: BattOr did not complete command 'StartTracing' correctly. Outputted: [0926/051857.103554:ERROR:serial_io_handler.cc(149)] Failed to open serial port: FILE_ERROR_ACCESS_DENIED https://chromium-swarm.appspot.com/task?id=38d540c48def3a10&refresh=10&show_raw=1&wide_logs=true Always happening when trying to run the first story. CL range: http://test-results.appspot.com/revision_range?start=503198&end=503280
,
Sep 26 2017
Unassigning, let's see if the return code bisect can figure this out.
,
Sep 26 2017
=== BISECT JOB RESULTS === NO Test failure found Bisect Details Configuration: android_one_perf_bisect Benchmark : system_health.common_mobile Metric : cpuTimeToFirstMeaningfulPaint_avg/background_social/background_social_facebook Revision Exit Code N chromium@503197 0 +- N/A 10 good chromium@503280 0 +- N/A 10 bad To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=background.social.facebook system_health.common_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8967386777507944496 For feedback, file a bug with component Speed>Bisection
,
Sep 26 2017
Bisect found nothing :( Could be something that changed recently on the setup of the perf bots? Randy, hope you can have a look.
,
Sep 26 2017
if possible, I think we should just try to disable battor testing on ANdroid One to lower priority of this bug (unless the fix is really trivial). Charlie will reevaluate the battor lab testing in Q4
,
Sep 26 2017
The problem (I believe) is when the BattOrs were added to the A1 devices, they were not properly added to the dialout group. Vince, did you get a chance to run those commands I gave you in that doc last week?
,
Sep 26 2017
Just talked to Vince, he ran them. I wonder if they need to be run from inside the Docker instance.
,
Sep 26 2017
I think the command may only have been run on one of the trybots. I just ran it on the main a1 bot, so if it starts passing battor tests we will know that it worked. On the bot I'm seeing this behavior right now which makes me think it might not: Checking if chrome-bot is in the dialout group at the top level, and it is: [1] DOCKER chrome-bot@build17-b1:(Linux 14.04):~$ id chrome-bot uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),20(dialout),29(audio),44(video),999(docker) Log into the docker instance for one of the android bots and check, it is not: [1] DOCKER chrome-bot@build17-b1:(Linux 14.04):~$ docker exec -it android_AG860440G8CI0GC /bin/bash root@build17-b1--device2:/# id chrome-bot uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),29(audio),44(video),999(docker) I also cannot add it. I dont know if it'll be an issue because the top level does have it in the group: root@build17-b1--device2:/# sudo usermod -a -G dialout chrome-bot usermod: cannot open /etc/passwd I'll keep an eye on the next run and if it doesn't work I'll add Ben to this conversation to see if there is any special thing needed to do that in docker.
,
Sep 26 2017
Actually, looking at the nexus5x bots (which have running battors) I can see that the chrome-bot is in the dialout group inside the docker instance on that bot: [1] DOCKER chrome-bot@build73-b1:(Linux 14.04):~$ docker exec -it android_00d093103eb3ad71 /bin/bash root@build73-b1--device5:/# id chrome-bot uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),20(dialout),29(audio),44(video),999(docker) Ben, do you remember how we added that to the dialout group on the nexus 5x bots?
,
Sep 26 2017
I don't think we did anything special; we just added chrome-bot to dialout in the way you listed in your setup doc. It should get propagated into the containers automatically. Though, it might not take effect until the container gets restarted (every 4-6 hours) or maybe once the host reboots (every 24 hours). Looking at syslogs on build17-b1, it looks like chrome-bot was added to the group earlier today at 09:16:25. Since that was pretty recent, I'm willing to bet we need to wait for each container to restart before they pick it up.
,
Sep 26 2017
That was me messing around this morning. Vince said (unless I misunderstood) that he also ran the command back when installing the BattOrs last week.
,
Sep 27 2017
power.idle_platform has started to pass. I think its because its container has been restarted and maybe the others haven't yet. I'll check in after the next run.
,
Sep 27 2017
Bah. I forgot that there are 3 hosts on the android bots, not one. I added the other 2 to the dialout group. They should stop having the issue soon.
,
Sep 28 2017
,
Oct 16 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Sep 26 2017