New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 715463 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Add instrumentation to host scheduler

Project Member Reported by pmalani@chromium.org, Apr 26 2017

Issue description

https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/scheduler/host_scheduler.py is the script in question.

Having finer grained instrumentation that can tell us how much team each stage of the tick() routine is taking will help us to recognize the hotspots, and also help us to understand whether any improvements we make are having the desired effect consistently.

I've added David Riley to the CC-list, as he may have information about how to add instrumentation, and create dashboards that observe said metrics.
 

Comment 1 by pwang@chromium.org, Apr 26 2017

Cc: pwang@chromium.org
David, could you kindly point me to some canonical example or doc for adding metrics (and how to create dashboards for them) for such scripts that run on the test servers?

Thanks!
Cc: pho...@chromium.org
Labels: Restrict-View-Google
Adding phobbs for best examples of metrics -- he may have written documentation as well..  I'd recommend creating details in a bug and getting some combination of akeshet/phobbs/myself to review, detailing the type of metric and the fields you're planning on adding.  Changing things after the fact is somewhat problematic, so it makes sense to get things right in the first place.  

For the Viceroy dashboards, I'd form your queries in panopticon, and then the dashboard code exists in: google3/monitoring/viceroy/dashboards/chromeos

I'm not sure how much you  know about Mash/Monarch, but there should be plenty of documentation at go/monarch-userguide and go/mash-reference.
I've written up some notes for first steps, as well as open questions. I'm still unclear on what the big picture is here (where and how does the host_scheduler fit into the test infrastructure):

https://docs.google.com/a/google.com/document/d/13UFXCgJnCBVBm3WMKIRE_yN-7AQbJUyKf-kYtvUctGs/edit?usp=sharing

I would greatly appreciate some pointers as comments.

Comment 5 by snanda@chromium.org, Apr 27 2017

Cc: akes...@chromium.org dshi@chromium.org
We have a tick rate counter for the host scheduler, you can see an example of its output here: https://viceroy.corp.google.com/chromeos/deputy-view#_VG__BnnYtbi

The big swings to/from zero that you see are an artifact of a bug in the metric that has been fixed but not yet pushed to prod (https://chromium-review.googlesource.com/c/486249/)

I think we have a worse problem with scheduler tick than with host-scheduler tick. The slowest host-scheduler looks to be around 12 per minute, which means we're paying a 1-time 5 second cost to each job. Scheduler tick is a much bigger lever, because we pay that for each transition in the job lifecycle of which there are several.
Cc: pprabhu@chromium.org
Yep, please use the RuntimeBreakdownCounter for these metrics (like the scheduler).
Any dashboarding needs will be taken care of uniformly if you do that.
Here's the example client code: http://shortn/_lrdxwYacuH
Where would the scheduler tick code be located?
Re #10: Linked to in #9.
host_scheduler.py should do something very similar to that.
I can help review.
So I am looking at the scheduler code alongside an execution strace of it. A lot of the time is spent servicing small requests like these:

18:16:25 write(3, "*\0\0\0\3SELECT * FROM afe_jobs WHERE ID=113762073", 46) = 46 <0.000121>

This is because we are creating a DBObject, which is created because it is the base class for the Job object, of which we are creating a lot.

So we need to instrument what is creating that many job objects.
Re: Comment #13. Thanks David.

Is it possible to add a dashboard with metrics pertaining to the host_scheduler.py script? There are decorators which suggest these are getting exported, but I don't know if it's available by default in Panopticon:

https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/scheduler/host_scheduler.py?rcl=7c0bccc268fc4a87fa3b586387a6414ee508299b&l=324
Status: Started (was: Untriaged)
Forking discussion back from https://chromium-review.googlesource.com/c/517249/

Caching job_ids in the host_scheduler is unlikely to work. OTOH, host_scheduler can definitely be a _lot_ smarter about how it obtains job information.

Roughly, a single host scheduler tick does the following:
- Look up HQEs that need host matching
- Look up free hosts
- match hosts to HQEs -- update HQEs

To reduce the number of DB hits, host_scheduler should run run batch (django) queries instead of using django models directly (because django models don't do batching very well in my experience).


But, taking a step back: is the goal here to _instrument=_ host_scheduler to find out where improvements are needed? Are we convinced that host_scheduler is a good place to put efforts into?
IOWs, my experience so far is that host_scheduler is not the bottleneck in the lab (master / shards)... 

It _is_ creating DB traffic that we can do without, so that's worth addressing. If we want to do just that, we should first instrument the django layer to tell us how many distinct DB queries being made per tick.

Labels: -Restrict-View-Google
Nothing here is RVG.

Comment 17 by ihf@chromium.org, Jun 1 2017

host_scheduler is single threaded and usually uses close to a full core on a 32 core shard, hence it is a bottleneck in the lab.

Comment 18 by dshi@chromium.org, Jun 1 2017

Re #17

host_scheduler is a bottleneck on master scheduler, but not on shard. Shard manages around 100 duts and schedule way less jobs comparing to master.

Also, as we've sharded most duts, host scheduler should run much faster on master too. Do we know what's the host scheduler tick time on master to conclude it's actually a bottleneck?

Comment 19 by ihf@chromium.org, Jun 1 2017

Shard manages 150-200 DUTs and will have more when we assign more boards. shard scheduler is a bottleneck.
RE: Comment #15 : Yes, the original intent of this bug was stated as instrumenting host_scheduler. Some stracing and print-debugging has revealed that the SQL querying seems sub-optimal, and if the work is significant, we can fork a separate bug.

Can we consider having a simple daemon through which all RDB querying is performed, and have the write-through caching and batching performed there? While it is a single process, we can have it be multi-threaded if it will help performance. There's probably some work involved in updating all services which make the RDB queries to use the new model, but caching does seem like it would save several RDB calls (I'd be happy to help in writing this daemon, and converting existing calls to use it :) )

I'm going to conduct some rudimentary analysis of the time taken for scheduler_tick when using in-memory caching v/s the current production method (while the new implementation can't be used in prod, I seem to be able to run it for long enough locally to get some measurements). That might give some insight into how much savings does caching actually provide.
Re #20: Separate bug for fixing is not necessary (we can rename this ;) )

imo, https://chromium-review.googlesource.com/c/521687/ is a great example of some discussion on this bug leading to improvements that are future proof.

Specific to your idea in #20, my first reaction is that it is at odds with the direction autotest infra code is going. We are trying to:
- short term: Make monitor_db multiprocess so that each job roughly has its own process. All these processes will interact with the database independently.
- medium term: Have these processes not interact with the database at all, but with a different synchronization service elsewhere.

If interested in the details, please see go/monitor_db_per_job_refactor

I am concerned about adding another process hop here that doesn't fit into the future architecture.
OTOH, batching of DB queries like the CL mentioned above will give us similar benefits and will continue to do so in the future.
Thanks for the comments Prathmesh,

Re: Comment #21,  WRT the short term optimizations you mentioned, I'm wondering what are the latencies involved. I did a rough measurement (using time.time() of each SQL call in _fetch_row_from_db(), and the time taken varies between 0.350ms to 0.550ms.

In your experience what would you expect IPC times to be, if we were to even house an in-memory cache of the DB? I'm interested in knowing whether it makes sense latency wise, as a thought experiment.
(To clarify, Comment #22 asks about the IPC latencies from the point of view of having a table cache daemon)
Project Member

Comment 24 by bugdroid1@chromium.org, Jun 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/085c1c5b39365553cda5fcafa935b1a18680d077

commit 085c1c5b39365553cda5fcafa935b1a18680d077
Author: Po-Hsien Wang <pwang@chromium.org>
Date: Sat Jun 10 04:44:02 2017

scheduler_models.py: group multiple afe_jobs queries

Override the fetch method in HostQueueEntry to prevent HostQueueEntry
fetching its own jobs from DB individually.

Use
'SELECT * FROM afe_jobs WHERE ID IN (jid1, ...) '
'ORDER BY FIELD (ID, jid1, jid2)'
to fetch all the jobs details from afe_jobs at once.

BUG= chromium:715463 
TEST=[1] unittests
TEST=[2] python host_scheduler.py on guado_moblab
         (after setting inline_host_acquisition: False in moblab_config.ini)

BEFORE (`strace | grep afe_jobs` per tick)
write("SELECT afe_host_queue_entries.* FROM afe_host_queue_entries...") = 371
write("SELECT * FROM afe_jobs WHERE ID=667") = 40
read ("def\24chromeos_autotest_db\10afe_jobs\...") = 3305
write("SELECT * FROM afe_jobs WHERE ID=669") = 40
read ("def\24chromeos_autotest_db\10afe_jobs\...") = 3278
...
...
write("SELECT afe_jobs.id, afe_acl_groups_users.aclgroup_id...") = 616
read ("def\24chromeos_autotest_db\10afe_jobs\...") = 997
write("SELECT job_id, label_id FROM ...") = 432
read ("def\24chromeos_autotest_db\...") = 2313

AFTER (`strace | grep afe_jobs` per tick)
write("SELECT afe_host_queue_entries.* FROM afe_host_queue_entries...") = 371
write("SELECT * FROM afe_jobs WHERE ID IN (667...") = 279
read ("def\24chromeos_autotest_db\10afe_jobs\...") = 8192
write("SELECT afe_jobs.id, afe_acl_groups_users.aclgroup_id...") = 384
read ("def\24chromeos_autotest_db\10afe_jobs\...") = 417
write("SELECT job_id, label_id FROM ...") = 200
read ("def\24chromeos_autotest_db\...") = 896

Change-Id: I657dbf29e679046452710de7627364f26fc1a81d
Reviewed-on: https://chromium-review.googlesource.com/521687
Commit-Ready: Po-Hsien Wang <pwang@chromium.org>
Tested-by: Po-Hsien Wang <pwang@chromium.org>
Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org>

[modify] https://crrev.com/085c1c5b39365553cda5fcafa935b1a18680d077/scheduler/scheduler_models_unittest.py
[modify] https://crrev.com/085c1c5b39365553cda5fcafa935b1a18680d077/scheduler/scheduler_models.py

Comment 25 by pwang@chromium.org, Jun 19 2017

Aside from the afe_jobs, after I added more duts into my local moblab, I found several afe_hosts queries, one afe_hosts queries for each grouped afe_jobs.

Though there are already cache mechanism in-place, it seems the cache hit ratio is only about 20-50% for each host.
https://pcon.corp.google.com/p#chrome-infra/queryplayground?yAxisMin=0&yAxisMin2=0&oldHeatmap=false&query=CAEY-AOKAXAKZAgPemAKXjJcChltb25hcmNoLmFjcXVpc2l0aW9ucy5UYXNrEj8KPS9jaHJvbWUvaW5mcmEvY2hyb21lb3MvYXV0b3Rlc3Qvc2NoZWR1bGVyL3JkYi9jYWNoZS9oaXRfcmF0aW8qAiAAMICkp9oGkgEXCgYIARigjQYSABoJaG9zdF9uYW1lYAA&names=Query%201
It would be interesting to know exactly what is being cached. Without that context, it is difficult to reason whether that hit-ratio is reasonable.

Comment 27 by pwang@chromium.org, Jun 19 2017

The code which manages the afe_hosts cache.
http://cs/chromeos_public/src/third_party/autotest/files/scheduler/rdb_cache_manager.py

I'm not sure if we can improve it by caching a more complete Hosts, Labels, ACLs data structure in-memory and managed by our own or not.


Status: WontFix (was: Started)
I'm not actively working on this issue, so I'm going to go ahead and close it.

Sign in to add a comment