New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 626198 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 697730
issue 625225



Sign in to add a comment

Find source of suspicious query

Project Member Reported by sbasi@chromium.org, Jul 7 2016

Issue description

We found this query running: SELECT 
COUNT(DISTINCT `afe_jobs`.`id`) FROM `afe_jobs` WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS sub_query WHERE parent_job_id IS NOT NULL AND (sub_query.parent_job_id=afe_jobs.id OR sub_query.id=afe_jobs.id))
)                                                     

for 13.7 days. So that places it being created ~6/23.
 

Comment 1 by sbasi@chromium.org, Jul 7 2016

Components: Infra>Client>ChromeOS
Labels: -Pri-3 Pri-1
I bet this is from frontend/afe/models.py:1260

See https://bugs.chromium.org/p/chromium/issues/detail?id=592704

This is used as part of the web front end, to render the XXX/YYYYY jobs showing when listing jobs.
> I bet this is from frontend/afe/models.py:1260

The call ultimately comes from get_num_jobs().  The
key point of interest is this bit:
  WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS sub_query ...

That comes from frontend/afe/rpc_utils.py:
    if suite:
        where.append('id IN ' + filter_common % parent_job_id)
    elif sub:
        where.append('id IN ' + filter_common % child_job_id)
    elif standalone:
        where.append('NOT EXISTS (SELECT 1 from afe_jobs AS sub_query '
                     'WHERE parent_job_id IS NOT NULL'
                     ' AND (sub_query.parent_job_id=afe_jobs.id'
                     ' OR sub_query.id=afe_jobs.id))')
    else:
        return extra_args

Which means that get_num_jobs() was called with `standalone=True`,
which is different from all the examples in bug 592704.  It's also
different from bug 599267.

There's no Python code to make that call.  I haven't figured out
whether any of the GWT could make that call.

As best I can tell, the GWT code can set 'standalone=True'.
On the main job list page:
    https://ubercautotest.corp.google.com/afe/#tab_id=job_list&state_filter=all&type_filter=all

You can find a set of radio buttons labeled "Parent Jobs",
"Child Jobs", "Standalone Jobs", and "All Jobs".  Those buttons
control the "standalone" parameter.

I tried actually selecting the "Standalone Jobs" radio
button.  It ran for quite a while, before finally hitting
a 502 error.  The error details claimed a timeout on the
backend.

The MySQL `show processlist;` command showed two queries
while my browser was processing:
    | 576697106 | cros-infra-admin | chromeos-server2.mtv.corp.google.com:48012  | chromeos_autotest_db | Query   |   63 | Sending data         | SELECT COUNT(DISTINCT `afe_jobs`.`id`) FROM `afe_jobs` WHERE (NOT EXISTS (SELECT 1 from afe_jobs AS  |
    | 576697110 | cros-infra-admin | chromeos-server2.mtv.corp.google.com:48017  | chromeos_autotest_db | Query   |   63 | Sending data         | SELECT DISTINCT `afe_jobs`.`id`, `afe_jobs`.`owner`, `afe_jobs`.`name`, `afe_jobs`.`priority`, `afe_ |

My conclusions:
  * Calling get_jobs(standalone=True) is problematic.  If we
    ban that call (or fix whatever bug it's provoking), we'll
    stop the problem from repeating.
  * The problem query _might_ have come through the AFE from
    a browser, but I'm not wholly convinced.  Both times I tried
    it, the browser timed out, and the queries ended.  Moreover,
    the AFE triggers _two_ queries, but in the original event,
    we only killed one.

+1 to banning the problematic query.

Also, can we set a query timeout inside of mysql, such that it will kill any query that takes longer than an hours (say)?
Blocking: 625225

Comment 8 by ihf@chromium.org, Jul 12 2016

Cc: ihf@chromium.org

Comment 9 by autumn@chromium.org, Jul 12 2016

Labels: Hotlist-Fixit
Charlene's got some work on killing queries as a general fix, moving to fixit for this specific issue. 
Blocking: 697730
Project Member

Comment 11 by bugdroid1@chromium.org, Mar 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/541debad7b2f90f8add9b4ae69b1489ba6752a0d

commit 541debad7b2f90f8add9b4ae69b1489ba6752a0d
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Mar 08 00:18:22 2017

autotest: web front end (JobListView) ban "standalone" radio button

There's some evidence that the query for standalone jobs is particularly
expensive, often the source of forever-hung queries in prod.

Eliminate the UI element for it.

BUG= chromium:626198 
TEST=None

Change-Id: I9d46e74201cf01585d320fa0be3b7fb8f4965fbd
Reviewed-on: https://chromium-review.googlesource.com/448703
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Simran Basi <sbasi@chromium.org>

[modify] https://crrev.com/541debad7b2f90f8add9b4ae69b1489ba6752a0d/frontend/client/src/autotest/afe/JobListView.java

Project Member

Comment 12 by bugdroid1@chromium.org, Mar 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/541debad7b2f90f8add9b4ae69b1489ba6752a0d

commit 541debad7b2f90f8add9b4ae69b1489ba6752a0d
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Mar 08 00:18:22 2017

autotest: web front end (JobListView) ban "standalone" radio button

There's some evidence that the query for standalone jobs is particularly
expensive, often the source of forever-hung queries in prod.

Eliminate the UI element for it.

BUG= chromium:626198 
TEST=None

Change-Id: I9d46e74201cf01585d320fa0be3b7fb8f4965fbd
Reviewed-on: https://chromium-review.googlesource.com/448703
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Simran Basi <sbasi@chromium.org>

[modify] https://crrev.com/541debad7b2f90f8add9b4ae69b1489ba6752a0d/frontend/client/src/autotest/afe/JobListView.java

The above CL removed the UI element for one particularly expensive JobListView filter.

The other ones are expensive too, but this one was particularly bad and also not very useful.
Status: Fixed (was: Untriaged)
There are too many undead bugs about mysql slowness.
Let's call this quits because the particular query this talks about is no longer possible.

Sign in to add a comment