New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 768806 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

system_health.common_mobile failing with BattOr error on Android One

Project Member Reported by perezju@chromium.org, Sep 26 2017

Issue description

First failure appeared in:
https://luci-milo.appspot.com/buildbot/chromium.perf/Android%20One%20Perf/533

The error is:
Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 97, in _RunStoryAndProcessErrorIfNeeded
    test.WillRunStory(state.platform)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py", line 276, in WillRunStory
    platform.tracing_controller.StartTracing(self._tbm_options.config)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py", line 43, in StartTracing
    self._tracing_controller_backend.StartTracing(tracing_config, timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py", line 91, in StartTracing
    started = agent.StartAgentTracing(config, timeout)
  File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/battor_tracing_agent.py", line 73, in StartAgentTracing
    self._battor.StartTracing()
  File "/b/swarming/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 235, in StartTracing
    self._SendBattOrCommand(self._START_TRACING_CMD)
  File "/b/swarming/w/ir/third_party/catapult/common/battor/battor/battor_wrapper.py", line 359, in _SendBattOrCommand
    'Outputted: %s' % (cmd, status))
BattOrError: BattOr did not complete command 'StartTracing' correctly.
Outputted: [0926/051857.103554:ERROR:serial_io_handler.cc(149)] Failed to open serial port: FILE_ERROR_ACCESS_DENIED
https://chromium-swarm.appspot.com/task?id=38d540c48def3a10&refresh=10&show_raw=1&wide_logs=true

Always happening when trying to run the first story.

CL range:
http://test-results.appspot.com/revision_range?start=503198&end=503280
 
Cc: rnep...@chromium.org
Owner: ----
Status: Untriaged (was: Assigned)
Unassigning, let's see if the return code bisect can figure this out.
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Sep 26 2017


=== BISECT JOB RESULTS ===
NO Test failure found

Bisect Details
  Configuration: android_one_perf_bisect
  Benchmark    : system_health.common_mobile
  Metric       : cpuTimeToFirstMeaningfulPaint_avg/background_social/background_social_facebook

Revision             Exit Code      N
chromium@503197      0 +- N/A       10      good
chromium@503280      0 +- N/A       10      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=background.social.facebook system_health.common_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8967386777507944496


For feedback, file a bug with component Speed>Bisection
Owner: rnep...@chromium.org
Status: Assigned (was: Untriaged)
Bisect found nothing :(

Could be something that changed recently on the setup of the perf bots?

Randy, hope you can have a look.
if possible, I think we should just try to disable battor testing on ANdroid One to lower priority of this bug (unless the fix is really trivial). 

Charlie will reevaluate the battor lab testing in Q4
Cc: vhang@chromium.org
The problem (I believe) is when the BattOrs were added to the A1 devices, they were not properly added to the dialout group. 

Vince, did you get a chance to run those commands I gave you in that doc last week?
Just talked to Vince, he ran them. I wonder if they need to be run from inside the Docker instance. 
I think the command may only have been run on one of the trybots. I just ran it on the main a1 bot, so if it starts passing battor tests we will know that it worked.

On the bot I'm seeing this behavior right now which makes me think it might not:

Checking if chrome-bot is in the dialout group at the top level, and it is:
[1] DOCKER chrome-bot@build17-b1:(Linux 14.04):~$ id chrome-bot
uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),20(dialout),29(audio),44(video),999(docker)

Log into the docker instance for one of the android bots and check, it is not:
[1] DOCKER chrome-bot@build17-b1:(Linux 14.04):~$ docker exec -it android_AG860440G8CI0GC /bin/bash
root@build17-b1--device2:/# id chrome-bot
uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),29(audio),44(video),999(docker)

I also cannot add it. I dont know if it'll be an issue because the top level does have it in the group:
root@build17-b1--device2:/# sudo usermod -a -G dialout chrome-bot
usermod: cannot open /etc/passwd


I'll keep an eye on the next run and if it doesn't work I'll add Ben to this conversation to see if there is any special thing needed to do that in docker. 


Cc: bpastene@chromium.org
Actually, looking at the nexus5x bots (which have running battors) I can see that the chrome-bot is in the dialout group inside the docker instance on that bot:

[1] DOCKER chrome-bot@build73-b1:(Linux 14.04):~$ docker exec -it android_00d093103eb3ad71 /bin/bash
root@build73-b1--device5:/# id chrome-bot
uid=1000(chrome-bot) gid=1000(chrome-bot) groups=1000(chrome-bot),20(dialout),29(audio),44(video),999(docker)

Ben, do you remember how we added that to the dialout group on the nexus 5x bots?
I don't think we did anything special; we just added chrome-bot to dialout in the way you listed in your setup doc. It should get propagated into the containers automatically. Though, it might not take effect until the container gets restarted (every 4-6 hours) or maybe once the host reboots (every 24 hours).

Looking at syslogs on build17-b1, it looks like chrome-bot was added to the group earlier today at 09:16:25. Since that was pretty recent, I'm willing to bet we need to wait for each container to restart before they pick it up.
That was me messing around this morning. Vince said (unless I misunderstood) that he also ran the command back when installing the BattOrs last week.


power.idle_platform has started to pass. I think its because its container has been restarted and maybe the others haven't yet. I'll check in after the next run.
Bah. I forgot that there are 3 hosts on the android bots, not one. I added the other 2 to the dialout group. They should stop having the issue soon.
Status: Fixed (was: Assigned)
Cc: crouleau@chromium.org johnchen@chromium.org
 Issue 768533  has been merged into this issue.

Sign in to add a comment