Win7 Tests (dbg)(1) consistently (?) fails to upload layout test results |
|||||||
Issue descriptionSee e.g. https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20(dbg)(1)/66190 - an otherwise successful build. Step 107 ("Upload to test-results [webkit_layout_tests]") is orange, and the stdout ends with: ssl.SSLError: The read operation timed out step returned non-zero exit code: 1
,
Feb 6 2018
http://shortn/_1E5a5X9Oda is a link to the logs for the test results server. It looks like the file is malformed somehow, although I'm not sure how exactly. Sean might know more
,
Feb 6 2018
This looks like the actual test-results server log entry for this particular build and step: https://pantheon.corp.google.com/logs/viewer?project=test-results-hrd&resource=gae_app&minLogLevel=0&expandAll=false&advancedFilter=resource.type%3D%22gae_app%22%0Aresource.labels.zone%3D%22us12%22%0Aresource.labels.project_id%3D%22test-results-hrd%22%0Aresource.labels.version_id%3D%2213735-2f315a7%22%0Aresource.labels.module_id%3D%22default%22%0Atimestamp%3D%222018-02-05T05%3A28%3A11.765788000Z%22%0AinsertId%3D%225a77eb7600074aad85813c4e%22&logName=projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log×tamp=2018-02-05T05%3A28%3A11.765788000Z&dateRangeEnd=2018-02-06T02%3A22%3A56.000Z&interval=JUMP_TO_TIME (The link in #2 is for "build 34343, test type telemetry_perf_unittests") I'm not sure what's going on since the server logs don't appear to report the return code for that request.
,
Feb 6 2018
https://pantheon.corp.google.com/logs/viewer?project=test-results-hrd&minLogLevel=0&expandAll=false×tamp=2018-02-06T01%3A23%3A11.809581000Z&dateRangeStart=2018-02-06T00%3A22%3A56.000Z&dateRangeEnd=2018-02-06T02%3A22%3A56.000Z&interval=JUMP_TO_TIME&resource=gae_app&logName=projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0Aresource.labels.version_id%3D%2213735-2f315a7%22%0Aresource.labels.module_id%3D%22default%22%0Aresource.labels.zone%3D%22us12%22%0Aresource.labels.project_id%3D%22test-results-hrd%22%0Atimestamp%3D%222018-02-06T01%3A22%3A54.859815000Z%22%0AinsertId%3D%225a79036f000b0ee3dbdd3916%22 is the log link i found. Sorry for messing up the link before. That has some errors in it.
,
Feb 6 2018
re #4: Yes that has errors in it, but that request log is for a different build and step than the one reported in the original bug description. build 34343, step telemetry_perf_unittests vs build 66190 test type webkit_layout_tests
,
Feb 6 2018
Oh whoops, sorry. Thanks for catching that.
,
Feb 12 2018
I looked into this again today. As far as I can tell, the upload request is failing because the instances handling the request run out of memory. I'm not sure why though. The JSON file is only 12 MB, and the instances hit a 1GB memory limit. https://pantheon.corp.google.com/logs/viewer?project=test-results-hrd&minLogLevel=0&expandAll=false×tamp=2018-02-12T18:04:34.299481000Z&interval=PT1H&resource=gae_app&logName=projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log&advancedFilter=resource.type%3D%22gae_app%22%0AlogName%3D%22projects%2Ftest-results-hrd%2Flogs%2Fappengine.googleapis.com%252Frequest_log%22%0Aoperation.id%3D%225a81d73200ff0491d972ec358d0001737e746573742d726573756c74732d687264000131333937342d38323165623233000100%22&dateRangeEnd=2018-02-12T20:50:50.510Z shows a sample request that does this. According to PLX there have only been a handful of successes in the last 50 builds.
,
Feb 13 2018
,
Feb 16 2018
,
Feb 16 2018
More flakes reported by that bot, but I still cannot see the results.
,
Feb 28 2018
Looks like the problem still exists. (An example from today: https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20(dbg)(1)/66190)
,
Feb 28 2018
I found the problematic request, and dug into the test results code. The log shows that the app is hitting the app engine memory limit during the request and OOMing. My guess right now is that there's some byte slice stuff going on in the app engine code which is copying unnecessary stuff around, causing everything to hang around? That's just my guess though.
,
Mar 30 2018
Ok, I think we fixed this in another bug. We found a semi memory leak; the app was allocating a ton of memory to parse json, so we optimized it, and test results doesn't 500 nearly as much anymore. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by dpranke@chromium.org
, Feb 6 2018Owner: martiniss@chromium.org
Status: Assigned (was: Untriaged)