Find source of suspicious query |
||||||
Issue descriptionWe found this query running: SELECT COUNT(DISTINCT `afe_jobs`.`id`) FROM `afe_jobs` WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS sub_query WHERE parent_job_id IS NOT NULL AND (sub_query.parent_job_id=afe_jobs.id OR sub_query.id=afe_jobs.id)) ) for 13.7 days. So that places it being created ~6/23.
,
Jul 7 2016
I bet this is from frontend/afe/models.py:1260 See https://bugs.chromium.org/p/chromium/issues/detail?id=592704 This is used as part of the web front end, to render the XXX/YYYYY jobs showing when listing jobs.
,
Jul 7 2016
> I bet this is from frontend/afe/models.py:1260
The call ultimately comes from get_num_jobs(). The
key point of interest is this bit:
WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS sub_query ...
That comes from frontend/afe/rpc_utils.py:
if suite:
where.append('id IN ' + filter_common % parent_job_id)
elif sub:
where.append('id IN ' + filter_common % child_job_id)
elif standalone:
where.append('NOT EXISTS (SELECT 1 from afe_jobs AS sub_query '
'WHERE parent_job_id IS NOT NULL'
' AND (sub_query.parent_job_id=afe_jobs.id'
' OR sub_query.id=afe_jobs.id))')
else:
return extra_args
Which means that get_num_jobs() was called with `standalone=True`,
which is different from all the examples in bug 592704. It's also
different from bug 599267.
There's no Python code to make that call. I haven't figured out
whether any of the GWT could make that call.
,
Jul 7 2016
As best I can tell, the GWT code can set 'standalone=True'.
On the main job list page:
https://ubercautotest.corp.google.com/afe/#tab_id=job_list&state_filter=all&type_filter=all
You can find a set of radio buttons labeled "Parent Jobs",
"Child Jobs", "Standalone Jobs", and "All Jobs". Those buttons
control the "standalone" parameter.
,
Jul 7 2016
I tried actually selecting the "Standalone Jobs" radio
button. It ran for quite a while, before finally hitting
a 502 error. The error details claimed a timeout on the
backend.
The MySQL `show processlist;` command showed two queries
while my browser was processing:
| 576697106 | cros-infra-admin | chromeos-server2.mtv.corp.google.com:48012 | chromeos_autotest_db | Query | 63 | Sending data | SELECT COUNT(DISTINCT `afe_jobs`.`id`) FROM `afe_jobs` WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS |
| 576697110 | cros-infra-admin | chromeos-server2.mtv.corp.google.com:48017 | chromeos_autotest_db | Query | 63 | Sending data | SELECT DISTINCT `afe_jobs`.`id`, `afe_jobs`.`owner`, `afe_jobs`.`name`, `afe_jobs`.`priority`, `afe_ |
My conclusions:
* Calling get_jobs(standalone=True) is problematic. If we
ban that call (or fix whatever bug it's provoking), we'll
stop the problem from repeating.
* The problem query _might_ have come through the AFE from
a browser, but I'm not wholly convinced. Both times I tried
it, the browser timed out, and the queries ended. Moreover,
the AFE triggers _two_ queries, but in the original event,
we only killed one.
,
Jul 7 2016
+1 to banning the problematic query. Also, can we set a query timeout inside of mysql, such that it will kill any query that takes longer than an hours (say)?
,
Jul 7 2016
,
Jul 12 2016
,
Jul 12 2016
Charlene's got some work on killing queries as a general fix, moving to fixit for this specific issue.
,
Mar 2 2017
,
Mar 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/541debad7b2f90f8add9b4ae69b1489ba6752a0d commit 541debad7b2f90f8add9b4ae69b1489ba6752a0d Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Mar 08 00:18:22 2017 autotest: web front end (JobListView) ban "standalone" radio button There's some evidence that the query for standalone jobs is particularly expensive, often the source of forever-hung queries in prod. Eliminate the UI element for it. BUG= chromium:626198 TEST=None Change-Id: I9d46e74201cf01585d320fa0be3b7fb8f4965fbd Reviewed-on: https://chromium-review.googlesource.com/448703 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> [modify] https://crrev.com/541debad7b2f90f8add9b4ae69b1489ba6752a0d/frontend/client/src/autotest/afe/JobListView.java
,
Mar 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/541debad7b2f90f8add9b4ae69b1489ba6752a0d commit 541debad7b2f90f8add9b4ae69b1489ba6752a0d Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Mar 08 00:18:22 2017 autotest: web front end (JobListView) ban "standalone" radio button There's some evidence that the query for standalone jobs is particularly expensive, often the source of forever-hung queries in prod. Eliminate the UI element for it. BUG= chromium:626198 TEST=None Change-Id: I9d46e74201cf01585d320fa0be3b7fb8f4965fbd Reviewed-on: https://chromium-review.googlesource.com/448703 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Simran Basi <sbasi@chromium.org> [modify] https://crrev.com/541debad7b2f90f8add9b4ae69b1489ba6752a0d/frontend/client/src/autotest/afe/JobListView.java
,
Mar 9 2017
The above CL removed the UI element for one particularly expensive JobListView filter. The other ones are expensive too, but this one was particularly bad and also not very useful.
,
Mar 31 2017
There are too many undead bugs about mysql slowness. Let's call this quits because the particular query this talks about is no longer possible. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sbasi@chromium.org
, Jul 7 2016Labels: -Pri-3 Pri-1