New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714216 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Investigate sql performance in the scheduler

Project Member Reported by marc...@chromium.org, Apr 21 2017

Issue description

Right now a lot of overhead of the scheduler is in the database requests:

- One part is issue 713841 where the sql database has to hit the disk for almost every request because the sql cache is too small. This issue is understood and we have a solution.

- Another part is the fact that django is making lots of very small sql requests. Each of these requests have pretty significant overhead, and in the end this costs us a lot of time on the scheduler side. This is what this issue is about.

I suspect we can do something about it, for example by optimizing the django requests, or by adding prefetching to django. Also if we upgrade to a newer version, django has support for persistent connections.
 
TODO: 
- strace the host scheduler to see what we are doing exactly
- look into connection persistence in django
- look into request prefetching in django

Comment 2 by ihf@chromium.org, Apr 22 2017

Components: Infra>Client>ChromeOS
Labels: M-60 OS-Chrome
strace from chromeos-server100. It is a large file.
strace.txt.xz
8.6 MB Download
Summary: Investigate sql performance in the scheduler (was: Investigate django performance in the scheduler)
The log from #2 has 1.9 million lines, and covers a time span of 16 min. Over these 16 minutes, we are doing 300000 database requests. The vast majority of these look like this:

18:29:11 write(3, "*\0\0\0\3SELECT * FROM afe_jobs WHERE ID=113765244", 46) = 46 <0.000146>
18:29:11 read(3, "\1\0\0\1\30>\0\0\2\3def\24chromeos_autotest_db\10afe_jobs\10afe_jobs\2id\2id\f?\0\v\0\0\0\3\3B\0\0\0D\0\0\3\3def\24chromeos_autotest_db\10afe_jobs\10afe_jobs\5owner\5owner\f!\0\375\2\0\0\375\10@\0\0\0B\0\0\4\3def\24chrome

These requests seem to originate from autotest/files/scheduler/scheduler_models.py:228 _fetch_row_from_db()

So the bottleneck has nothing to do with django, and I am updating the title to reflect this. I'll keep investigating.


to find out how much time is spend in the afe_jobs sql requests, I ran:

 cat strace.txt |grep -A 4 "SELECT \* FROM afe_jobs WHERE ID="| cut -d "<" -f 2 | cut -f 1 -d ">"|grep ^0\. |awk '{s+=$1} END {print s}'

(each request is followed by 4 more syscalls for reply/commit, so we have to count those too).

The sum of these requests over 16 minutes accounts for 190.937 seconds.
Btw since database only accounts for ~20% of the time, there are probably other optimizations in the code that we should look at after this...
Cc: pprabhu@chromium.org dshi@chromium.org
Owner: pwang@chromium.org
I think pwang@ is working on this, e.g. https://chromium-review.googlesource.com/#/c/521687/

Comment 7 by pwang@chromium.org, Jun 19 2017

I'm curious about the timestamp in the strace file.

I ran:
cat strace.txt | cut -d "<" -f 2 | cut -f 1 -d ">"|grep ^0\. |awk '{s+=$1} END {print s}'

And it seems only 243.271 seconds has passed in total instead of 960 seconds(16 mins)?
Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.
Labels: -akeshet-pending-downgrade Pri-2
ChromeOS Infra P1 Bugscrub.

Issue untouched in a week after previous message. Downgrading to P2.
Status: Archived (was: Untriaged)

Sign in to add a comment