New issue
Advanced search Search tips

Issue 809408 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 812941
issue 794136



Sign in to add a comment

Win7 Tests (dbg)(1) consistently (?) fails to upload layout test results

Project Member Reported by mstensho@chromium.org, Feb 6 2018

Issue description

See e.g. https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20(dbg)(1)/66190 - an otherwise successful build. Step 107 ("Upload to test-results [webkit_layout_tests]") is orange, and the stdout ends with:

ssl.SSLError: The read operation timed out
step returned non-zero exit code: 1
 
Components: Infra>Client>Chrome
Owner: martiniss@chromium.org
Status: Assigned (was: Untriaged)
@martiniss - can you look at this one, too?
Cc: seanmccullough@chromium.org
http://shortn/_1E5a5X9Oda is a link to the logs for the test results server. It looks like the file is malformed somehow, although I'm not sure how exactly. Sean might know more
re #4: Yes that has errors in it, but that request log is for a different build and step than the one reported in the original bug description.  

build 34343, step telemetry_perf_unittests
vs
build 66190 test type webkit_layout_tests

Oh whoops, sorry. Thanks for catching that.
I looked into this again today. As far as I can tell, the upload request is failing because the instances handling the request run out of memory. I'm not sure why though. The JSON file is only 12 MB, and the instances hit a 1GB memory limit.

https://pantheon.corp.google.com/logs/viewer?project=test-results-hrd&minLogLevel=0&expandAll=false&timestamp=2018-02-12T18:04:34.299481000Z&interval=PT1H&resource=gae_app&logName=projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0AlogName%3D%22projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log%22%0Aoperation.id%3D%225a81d73200ff0491d972ec358d0001737e746573742d726573756c74732d687264000131333937342d38323165623233000100%22&dateRangeEnd=2018-02-12T20:50:50.510Z shows a sample request that does this.

According to PLX there have only been a handful of successes in the last 50 builds.
Blocking: 794136
Blocking: 812941
Labels: -Pri-2 Pri-1
More flakes reported by that bot, but I still cannot see the results.
Cc: robertma@chromium.org
Looks like the problem still exists. (An example from today: https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20(dbg)(1)/66190)
I found the problematic request, and dug into the test results code. The log shows that the app is hitting the app engine memory limit during the request and OOMing. My guess right now is that there's some byte slice stuff going on in the app engine code which is copying unnecessary stuff around, causing everything to hang around? That's just my guess though. 
Status: Fixed (was: Assigned)
Ok, I think we fixed this in another bug. We found a semi memory leak; the app was allocating a ton of memory to parse json, so we optimized it, and test results doesn't 500 nearly as much anymore. 

Sign in to add a comment