New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 765777 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Jan 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 757933
issue 765776



Sign in to add a comment

Add log data to json test results produced by telemetry

Project Member Reported by martiniss@chromium.org, Sep 15 2017

Issue description

Telemetry currently produces json test results. IIRC this was implemented by ashleymarie@. Apparently the ability to include logs for each test already exists, and is used by other test suites. We'd like to use this ability for perf benchmarks, as we're changing how we execute our tests, and our current system for viewing logs (looking at the swarming log) isn't very scalable given the new architecture.

It's unclear exactly how this works; https://build.chromium.org/p/chromium.linux/builders/Linux%20Tests%20%28dbg%29%281%29/builds/66591 is an example of this happening. I'll investigate this more.
 
There is a bit of chicken & egg problem here if we implement this inside Telemetry. By the time we generate JSON test results in Telemetry, the program is still running, hence the log is not complete yet.

I looked into how https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_layout_tests&showExpectations=true&tests=http%2Ftests%2Fnavigation%2Fstart-load-during-provisional-loader-detach.html shows results. It turns out that it's a rough version of my original plan.

It looks like the tests upload their data to a google storage bucket, with a known prefix in test results (https://cs.chromium.org/chromium/infra/go/src/infra/appengine/test-results/frontend/static/dashboards/js/flakiness_dashboard.js?q=flakiness_dashboard.js&sq=package:chromium&l=38), which then is used by the flakiness dashboard to fetch the results and display them. It uses google storage as a backend, rather than logdog, but that's roughly what we had decided to do. But there's nothing in the test result json file which indicates where logs are. So I think the first order of things is to add that capability. Does that sound correct dirk?

Comment 3 by eyaich@chromium.org, Sep 18 2017

Reply to comment #1: 

In regards to the chicken & the egg problem:

1) where do we generate the results in telemetry?
2) at that point, what data don't we have? 
3) what data do we actually need?  do we have enough at that point?
4) There were discussions around what we actually care to look at in SOM and from the swarming logs.  Do we care about a bunch of debug statements?  That is the exception debug case (ie really knowledgeable bot health sheriffs like stephen).  If we want to pass this off to chromium sheriffs they want high level.  

I think only telemetry, not recipe code, should have to know about telemetry and what is needed to debug telemetry.  Lets try and push it in there if we can.

Comment 4 by eyaich@chromium.org, Sep 18 2017

In reply to comment #2: 

Stephen can you clarify for the benefit of those on this bug that weren't in the Friday meeting, what was your original plan you reference?

I agree with you Stephen that the next step is to add the capability to indicate where the logs are in the test results format (ashleymarie@ will hopefully guide us on where to add these from in telemetry) 

One of the other open questions in my mind is whether or not google storage is the right path for these links.  If we are to add links to the test results format, where do those links lead to?  Google storage? logdog?  I know we talked about not doing logdog for one buildbot step right now, but is that the right path in the future?  I thought logdog backed by google storage anyways, so why don't we just use the built in python library Robbie was referring to in the meeting instead of doing an intermediate solution.

Cc: perezju@chromium.org
#3:
1) The code of json test results in https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/results/json_3_output_formatter.py?dr=CSs.

The method for generated the results is invoked in https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=49fbcfa16b5d148a595cc50036d5e8354739f13f&l=364

And https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/story_runner.py?rcl=49fbcfa16b5d148a595cc50036d5e8354739f13f&l=304 (for case the whole benchmark disabled)

2) At that point, we wouldn't have the log of anything that happen after it. Most of them don't matter, except for things like: 
+ A stack trace if there is a crash in code that generates results.
+ A stack trace if there is a crash in code that upload Telemetry's artifacts to the perf dashboard.
+ Log message related to Telemetry's best efforts to shut down lingering processes (we do this through https://cs.chromium.org/chromium/src/third_party/catapult/common/py_utils/py_utils/atexit_with_log.py?dr&q=atexit+file:%5Esrc/third_party/catapult/+package:%5Echromium$&l=1)

3 & 4) These is a great question to ask. For simplification, I can totally see that we divide Telemetry's log message into two categories:
+ Non experts: these include Telemetry's logging data for each tests, stack trace, Chrome browser crash stack, screenshot.
+ Experts: the full Telemetry stack, which include things in (2) and things that happen outside of story test run loop.


**Background on Telemetry's test life cycle. Telemetry would run tests in the following way:

Commandline for running Telemetry benchmark is invoked.

Telemetry does stuffs to prepare for the benchmark suite run (discover the platform to run on, picking the browser...)

Start run story 1
...
Finish run story 1
Start run story 2
...
Finish run story 2
Start run story 3
...
Finish run story 3
...
...
Start run story N
...
Finish run story N

Telemetry does clean up stuffs after all stories are run: generate the test results, upload files to cloud storage, kill off lingering processess...

+Juan in case I miss anything.

Comment 6 by eyaich@chromium.org, Sep 18 2017

Blocking: 765776

Comment 7 by eyaich@chromium.org, Sep 18 2017

Blocking: 757933
I feel like I'm missing a bit of context on what the current plans are so, some questions/comments follow:

- When adding logs to the json results, does that mean we can assign an individual log fragment to each individual story run? It would be amazing if one could quickly go from a failed story link (on buildbot or SoM) to the log of _that_ particular story.

- I think it's fine is we miss some things outside of the main story-run-loop; as long as those are kept in the main "one huge log" for the entire benchmark run somewhere.

- What to keep on the per story logs? At least browser info, and exception/crash info plus screen shot in case of failure. (i.e. what Ned mentioned as "non experts log" sgtm).
> - When adding logs to the json results, does that mean we can assign an 
> individual log fragment to each individual story run? 

Yes, I think that's the goal.
Project Member

Comment 10 by sheriffbot@chromium.org, Sep 24

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Fixed (was: Untriaged)

Sign in to add a comment