Issue metadata
Sign in to add a comment
|
Timeout when running several WebGL offscreencanvas cases |
||||||||||||||||||||||
Issue description
OS: Windows 10
In total, there are 4 timeout cases as below:
WebglConformance_conformance_offscreencanvas_context_lost_restored_worker
WebglConformance_conformance_offscreencanvas_context_lost_worker
WebglConformance_conformance2_offscreencanvas_context_creation_worker
WebglConformance_conformance_offscreencanvas_context_creation_worker
What steps will reproduce the problem?
(1) Set up a local http server, such as "python -mSimpleHTTPServer", in src/ directory.
(2) Open one of above cases in latest Canary (my version is 71.0.3578.0), like http://127.0.0.1:8000/third_party/webgl/src/sdk/tests/conformance/offscreencanvas/context-lost-restored-worker.html?webglVersion=2
(3) The case will timeout. If you open the console, you will see some message like "Refused to execute script from 'http://127.0.0.1:8000/third_party/webgl/src/sdk/tests/js/tests/canvas-tests-utils.js' because its MIME type ('text/plain') is not executable."
What is the expected result?
The cases can pass.
What happens instead?
The cases timeout.
I dig a bit more and below are the findings so far:
1. These cases can pass on my laptop, while timed out on 2 desktops I tried. I don't know what are the exact difference between them that can cause the issue, here I just list a few configurations.
[laptop]
Type: HP EliteBook 840 G3
GPU: HD Graphics 520 (I don't think it's an issue related to GPU)
OS: Win10 1709 (OS Build 16299.665)
[desktop]
Type: Skylake (CPU: i7-6700K)
GPU: NVIDIA GTX 1060
OS: Win10 1803 (OS Build 17134.345)
2. On desktop, it's actually a regression. I tried it with various channels of latest Chrome, and below are the results:
Stable: 69.0.3497.100 PASS
Beta: 70.0.3538.54 PASS
Dev: 71.0.3573.0 TIMEOUT
Canary: 71.0.3578.0 TIMEOUT
I further bisected it, the last good one is r588847, and the first bad one is r588854. No build is in between at your server. As I will be off for the following 2 weeks, I may not have time recently to go further.
3. The issue is not related to python version used to set up http server. I tried both the python I downloaded from python.org and the one in depot_tools, the situation are the same.
4. This is not related to DevTool protocol, as I can repro it with pure http server and run it directly in browser.
5. For the headers of canvas-tests-utils.js (Developer Tools -> Network -> Click "canvas-tests-utils.js" -> Headers), I got:
[good]
Request URL: http://127.0.0.1:10133/third_party/webgl/src/sdk/tests/js/tests/canvas-tests-utils.js
Request Method: GET
Status Code: 200 OK (from disk cache)
Remote Address: 127.0.0.1:10103
Referrer Policy: no-referrer-when-downgrade
Content-Encoding: gzip
Content-Length: 6375
Content-Type: application/javascript
Date: Mon, 08 Oct 2018 16:06:17 GMT
Last-Modified: Fri, 31 Aug 2018 04:58:50 GMT
Server: SimpleHTTP/0.6 Python/2.7.13
Provisional headers are shown
Referer: http://127.0.0.1:10133/third_party/webgl/src/sdk/tests/conformance/offscreencanvas/context-creation-worker.js
Sec-Metadata: destination=script, site=same-origin
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3567.0 Safari/537.36
[bad]
Request URL: http://127.0.0.1:8763/third_party/webgl/src/sdk/tests/js/tests/canvas-tests-utils.js
Request Method: GET
Status Code: 200 OK (from disk cache)
Remote Address: 127.0.0.1:8755
Referrer Policy: no-referrer-when-downgrade
Content-Length: 31403
Content-Type: text/plain
Date: Mon, 08 Oct 2018 16:03:13 GMT
Last-Modified: Fri, 31 Aug 2018 04:58:50 GMT
Server: SimpleHTTP/0.6 Python/2.7.6
Provisional headers are shown
Referer: http://127.0.0.1:8763/third_party/webgl/src/sdk/tests/conformance/offscreencanvas/context-lost-restored-worker.js
Sec-Metadata: destination=script, site=same-origin
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3567.0 Safari/537.36
,
Oct 12
I'm at home and have only limited means to check right now, but since Ken says it's urgent: 1, The CL you identified default-turns on a runtime enabled feature. You should be able to verify if that's the right change by running whatever you're running with something like: --disable-blink-feature=WorkerNosniffBlock (See: https://www.chromium.org/blink/runtime-enabled-features) 2, I can't conceive how this would have anything to do with Unix file permissions. The change checks the MIME type associated with Worker scripts. If there's a worker script involved, and if whatever scripts this runs doesn't have <script/>-compatible MIME type associcated, then this could well be it. 3, There was a vaguely similar complaint just a few days ago (reported in crbug.com/ 890316, fixed in crrev.com/c/1256947). I'd appreciate if you could check whether that maybe fixes this situation as well. (The "fix" there is to relax the check and to never check non-network URLs, including file:-URLs in particular.) Since it's Friday >8pm over here I'm checking out now. I can promise to look at it first thing on Monday; I'd appreciate if until then you could have checked against a build including the fix in #3 above.
,
Oct 12
Thanks vogelheim@ - no stress, looking into this next week is fine. Thanks for all the pointers. The reason I thought Unix file permissions might have something to do with it was that perhaps Python's built-in HTTP server might decide to serve up .js files with MIME type text/plain if they were not executable, or something else weird. Please note that there are no file: URLs in use here. The conformance suite is served up over HTTP from localhost inside this test harness. The importScripts call from one of the failing tests is here: https://cs.chromium.org/chromium/src/third_party/webgl/src/sdk/tests/conformance/offscreencanvas/context-lost-restored-worker.js?type=cs&sq=package:chromium&g=0&l=24 and since this file is pulled in from this HTML file over http: https://cs.chromium.org/chromium/src/third_party/webgl/src/sdk/tests/conformance/offscreencanvas/context-lost-restored-worker.html?type=cs&sq=package:chromium&g=0 then I don't think the change in #3 should affect this. Yunchao, is there any way you can help diagnose this? We don't have machines here that reproduce the problem Yang was experiencing.
,
Oct 13
Ah... if this is HTTP(s) requests, then we should check whether it's maybe an intentional result of the change: The change blocks (some) JavaScript that the server has advertised (via Content-type:) as being not-javascript (e.g. text/plain) rather than javascript (text/javascript). If so, then I'd suggest the server to be fixed. (Even if we were to revert the patch, this would be coming back sooner or later.) So.. please check which MIMEtype the server sends along with the scripts that get blocked. The easiest way to check is probably either curl (on the command line) or the dev inspector.
,
Oct 13
Thank you both for the quick response! I forgot to mention the exact same case hosted at khronos.org can pass. vogelheim@, For your first comment, do you mean to add option "--disable-blink-feature=WorkerNosniffBlock" when running Chrome? I tried it but it still didn't pass. Per your 3rd suggestion, the fix crrev.com/c/1256947 is r596660. I downloaded r596668 from your server, but it still timed out. kbr@, I tried both python.bat and vpython.bat in depot_tools, like "depot_tools\python.bat -mSimpleHTTPServer" and "depot_tools\vpython.bat -mSimpleHTTPServer", but I didn't have luck. Let's wait for feedback from our driver team. I guess it's related to some system configurations within Intel as I tried 2 machines and both had problem.
,
Oct 13
Changed the title as r588854 is a reasonable change.
,
Oct 13
vogelheim@, it's text/plain. Please see details in my 5th finding in issue description.
,
Oct 15
Yang, thanks for the analysis at #5 - #7. I don't know much about Python's SimpleHTTPServer, but there's several examples in the code base where it's used, often with a wrapper that tells it explicitly to map .js to a JavaScript mime type. E.g. here: https://cs.chromium.org/chromium/src/third_party/skia/experimental/canvaskit/serve.py?sq=package:chromium&l=14&dr=C
,
Oct 15
Adding some more labels. The HTTP server that serves up these test files is buried deep in Telemetry's internals, and I'm not sure where its configuration files are. Ned or Caleb, do you know how we could reliably add this MIME type for JS files served up by Telemetry's built-in HTTP server?
,
Oct 15
Yang, could you try using vpython with our main test harness entry point (run_gpu_integration_test.py webgl_conformance ...) and use --test-filter to try running just these few tests? That harness will almost surely set up the web server differently than Python's SimpleHTTPServer will, and running inside the run_gpu_integration_test harness is the configuration we really care about. Thanks.
,
Oct 15
I had no luck with vpython and run_gpu_integration_test.py. I changed the folder name of system python so that it couldn't be found at all by %PATH%, and run below command "d:\workspace\project\chromium-webgl\depot_tools\vpython content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=exact --browser-executable=out/Default/chrome.exe --test-filter=conformance_offscreencanvas_context_lost_restored_worker --webgl-conformance-version=2.0.1". I got the same error messages. BTW, I'm not sure about the diff between vpython.bat and vpython, so I ran them both and got same errors. I suspect it might be related to Windows version. Did you use Redstone4 (Win10 1803) for testing. If no, can you find one machine to run above command? Our driver team reported the script run_gpu_integration_test.py couldn't work with Redstone 5 at all. We had a testing today, and it seemed to have the reported issue (I'm on vacation so I just saw the report, but didn't check the details).
,
Oct 16
Ned: thanks for the pointer. Do you think that the content_type is supposed to be guessed as application/javascript already by the mimetypes library? https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/core/memory_cache_http_server.py?rcl=5aac72d05c7ed1238c420660d0786d98da9d73da&l=186
,
Oct 16
I think so:
>>> mimetypes.guess_type('foo.js')
('application/javascript', None)
Though I would add logging into that code to see if Telemetry sent the wrong content_type
,
Oct 16
I've attempted to add some logging to the memory_cache_http_server in https://chromium-review.googlesource.com/1282273 but it doesn't seem to be getting hit and I don't understand why.
,
Oct 16
Ken: that is because the memory_cache_http_server is created as a subprocess (https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/core/local_server.py?rcl=2be20fdd2d702dc3081b2d051b7b95ebc12d9e74&l=78). You would want to dump the log into a file instead
,
Oct 16
Ned, could you please help me by showing me how to get logging out of that class? The attached patch doesn't work on Linux - logfile.txt is always zero bytes when Telemetry shuts down. Thanks.
,
Oct 16
,
Oct 18
Here is how I get the log to work: https://chromium-review.googlesource.com/c/catapult/+/1289710 Passing this back to Ken to triage
,
Oct 30
Thanks Ned. Confirming I can see the requests being serviced with your patch. Yang: everything looks OK when I run these tests locally. Could you please try applying Ned's patch to your src/third_party/catapult and run for example: ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=dev --test-filter=conformance_offscreencanvas_context_creation_worker (replace dev with the browser version you're testing and modify Ned's patch to specify a valid path on your Windows workstation) Thanks.
,
Oct 30
There are some errors in my command line, and please see cmd-log.txt attached. I also attach the log.txt as an output of script (Nothing interesting there).
,
Oct 30
Thanks Yang for testing. Could you invoke run_gpu_integration_test.py with depot_tools' vpython instead of python? That should clear up the warnings about numpy, at least. Could you add more logging to your memory_cache_http_server.py and print the result of the call to mimetypes.guess_type() for the .js files in AddFileToResourceMap? It looks like it's returning text/plain for JavaScript files and I don't understand why it would be.
,
Nov 2
Sorry about the delayed reply as I'm a bit occupied by other stuff. Today I spent some time to debug this and I think I know the root cause now.
The return of mimetypes.guess_type() for ".js" files from my machine are always "text/plain". And in depot_tools/win_tools-2_7_6_bin/python/bin/Lib/mimetypes.py, the related code is as below:
def guess_type(self, url, strict=True):
......
types_map = self.types_map[True]
if ext in types_map:
return types_map[ext], encoding
Though mimetypes.py provides a mapping from extension to mime type, the initialization of types_map will use Windows registry first. The related code is as below:
db = MimeTypes()
if files is None:
if _winreg:
db.read_windows_registry()
files = knownfiles
......
types_map = db.types_map[True]
And in my Windows registry, "Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.js\Content Type" is "text/plain"!!! I checked 2 desktops at hand, and they are the same. These desktops are used as our test machine with image provided by Intel. I think this is set on purpose for maybe security reason.
After I changed the registry value to "application/javascript", everything is OK now.
,
Nov 2
wow, having mime type depend on the OS's config seems problematic. I wonder if we should supply our default map type to mimetype.init(..) to avoid this problem.
,
Nov 2
I should have mentioned "Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.js\Content Type" is same with "Computer\HKEY_CLASSES_ROOT\.js\Content Type". I just checked the latest python code, and it changed the entry of registry and I have no problem with it.
[current code]
with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT, '') as hkcr:
[latest code]
with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT,
r'MIME\Database\Content Type') as mimedb:
I also checked latest python 2.7 (2.7.15) doesn't include the good change.
,
Nov 6
,
Nov 6
@yang.gu thanks for your analysis. @nednguyen it sounds like a good idea to prepopulate the MIME type table in memory_cache_http_server.py to make it more robust to system level bugs like this. Who can take that responsibility?
,
Nov 7
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/bdce91287f2dc5d2ed4d474125005fd16e298b21 commit bdce91287f2dc5d2ed4d474125005fd16e298b21 Author: Nghia Nguyen <nednguyen@google.com> Date: Wed Nov 07 12:28:35 2018 Add a fixed mimetypes file to be used by Telemetry's memory_cache_http_server By default, memory_cache_http_server's mimetypes module rely on the system's mimetypes. Different system can have different mimetypes file, which can cause non-deterministic behavior. We check in a fixed mime.types file which is used for init mimetypes module to make Telemetry more deterministic. Bug: chromium:894868 Change-Id: Id7a0bdb47809aaf5be5c412a1039ea2b614d1f87 Reviewed-on: https://chromium-review.googlesource.com/c/1320733 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Kenneth Russell <kbr@chromium.org> Reviewed-by: Caleb Rouleau <crouleau@chromium.org> [add] https://crrev.com/bdce91287f2dc5d2ed4d474125005fd16e298b21/telemetry/telemetry/core/mime.types [modify] https://crrev.com/bdce91287f2dc5d2ed4d474125005fd16e298b21/telemetry/telemetry/core/memory_cache_http_server.py
,
Nov 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f749872e90d84d3d9df95fd1e257d0e177c77801 commit f749872e90d84d3d9df95fd1e257d0e177c77801 Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Date: Thu Nov 08 05:11:46 2018 Roll src/third_party/catapult 1a1b38dabca6..026f83d49289 (5 commits) https://chromium.googlesource.com/catapult.git/+log/1a1b38dabca6..026f83d49289 git log 1a1b38dabca6..026f83d49289 --date=short --no-merges --format='%ad %ae %s' 2018-11-08 nednguyen@google.com Revert "Remove legacy timeline based metrics (TBMv1) & all related code" 2018-11-07 benjhayden@chromium.org Permit lack of value.name in convertChartJson. 2018-11-07 pasko@chromium.org androidStartupMetric: workarund for missing main entry point marker 2018-11-07 sergiyb@chromium.org Decrease timeout only if there is a watchdog timer configured 2018-11-07 nednguyen@google.com Add a fixed mimetypes file to be used by Telemetry's memory_cache_http_server Created with: gclient setdep -r src/third_party/catapult@026f83d49289 The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG=chromium:900878, chromium:902391 ,chromium:899721, chromium:894868 TBR=sullivan@chromium.org Change-Id: I7d6e937f277bb7938ea7b66843d5415b8054f487 Reviewed-on: https://chromium-review.googlesource.com/c/1325373 Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#606350} [modify] https://crrev.com/f749872e90d84d3d9df95fd1e257d0e177c77801/DEPS
,
Nov 8
I think this is fixed?
,
Nov 8
Thank you Ned for fixing this, and Yang for getting to the root of it! Yang, please verify when you have a chance. Thanks.
,
Nov 8
I started the test against r606350 (the roll of catapult) when I left office yesterday, and it works like a charm. Let me change the status to verified. Thank nednguyen@ and kbr@ for all the help and fix! |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Oct 12Cc: mkwst@chromium.org yunchao...@intel.com
Labels: -Type-Bug -Pri-3 Pri-2 Type-Bug-Regression
Owner: vogelheim@chromium.org