New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 621716 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

add a metric about devserver artifact "caching efficiency"

Project Member Reported by akes...@chromium.org, Jun 20 2016

Issue description

Easy way: After a build is staged, increment a counter every time you reply True to is_staged request. (possibly also increment some separate download count counter for every download request per artifact). When the build expires and is un-staged, throw this count at monarch / statsd.

Harder way long term: Emit a data point to database about every single build that is staged and that is downloaded, from every devserver. This allows us to make more sophisticated queries and to figure out which builds or artifact types are not being efficiently cached.

Starting with dgarrett@ who may have a way to quickly implement the easy way.
 

Comment 1 by autumn@chromium.org, Jun 21 2016

Labels: -current-issue
Labels: Hotlist-CrOS-DevServerLoad
Labels: -dut-health
So... I know where the code that stages artifacts is now.

How do I reports stats about it in a way that is useful?
Cc: pho...@chromium.org
my preferred approach would be to use ts_mon from the devserver to export to monarch. I think jrbarnette is working on an example CL in autotest that will do that from the scheduler. The devserver doesn't have autotest, but I think it has chromite that it can import.

Look at cidb.py use of ts_mon for an example. We can discuss schema in person.
The devserver doesn't currently import from chromite. It seems to be pretty standalone.


Cc: dshi@chromium.org
I think there must be something on the devserver that is importing from chromite, because we do have devserver stats on stads, which I think we get from chromite these days. Maybe it's a separate watchdog process and not devserver.py

dshi@ do you know?

Also, you may want to base your metric around this wrapper class: https://chromium-review.googlesource.com/#/c/354691/
Ah... devserver itself does not, but some libraries in the same directory tree do. I don't THINK any of those libraries are imported to devserver proper. It would really surprise me if chromite was available for cases like "cros flash".

grep -R chromite 
host/lib/tools.py:from chromite.lib import cros_build_lib
host/lib/tools.py:from chromite.lib import git
host/lib/cros_output.py:from chromite.lib import terminal
host/willis:  import chromite.lib.git
host/willis:    import chromite.lib.git
host/willis:    print 'Failed to import chromite library'
host/willis:import chromite.lib.terminal
host/willis:Colorizer = chromite.lib.terminal.Color(os.isatty(sys.stdout.fileno()))
host/willis:      if chromite.lib.git.IsSHA1(project['revision']):
host/willis:  repo_path = chromite.lib.git.FindRepoDir(os.getcwd())
host/willis:    repo_path = chromite.lib.git.FindRepoDir(os.path.dirname(sys.argv[0]))
host/willis:  manifest = chromite.lib.git.ManifestCheckout(root)
checkfiles/devserver/gs_check.py:from chromite.lib import cros_build_lib
checkfiles/devserver/gs_check.py:    # Additionally, we cannot use chromite's timeout_util as it is


Comment 9 by dshi@chromium.org, Jun 22 2016

Re #7
devserver stats in graphite is reported in autotest. Most data are reported based on check_health call.
I had a thought about this last night.

The current devserver logs show when something is staged. To quickly identify staging problems in production, we can just grep logs.

Then build a better answer when there isn't such a rush.
After going through logs at some length, we don't have enough information there to see if we are doing excessive staging.

We carefully log whenever we ask to stage something, and we log the basename of every file we download. But there is no good way to tie those logs together to see what builds we are downloading files for.

To fix this, we'd need to update this log message appropriately:
  https://cs.corp.google.com/chromeos_public/src/platform/dev/build_artifact.py?rcl=57d181772c8d3cf99b383ccdb9864f2ffc1cbade&l=327
Status: WontFix (was: Assigned)
We didn't pull out enough information to be useful as part of this project, but unrelated metrics effort should be addressing the same problem in a more generalized way.

Sign in to add a comment