New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 675048 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Drop dangerous call from scheduler to host_scheduler.schedule_host_job

Project Member Reported by pprabhu@chromium.org, Dec 16 2016

Issue description

hosts may be assigned to jobs via two independent paths when inline_host_acquisition is False:

Normal path: Via host_scheduler process. This goes via HostScheculer.tick --> HostScheduler->_schedule_jobs --> (HostScheduler.find_hosts_for_jobs, HostScheduler._reocrd_host_assignments, HostScheduler.schedule_host_jobs)
[http://shortn/_mY1kPHrRI6]

But, there is a second path. If BaseDispatcher finds jobs waiting for hosts, it can try to assign hosts to jobs via: BaseDispatcher._schedule_new_jobs -> (HostScheduler.find_hosts_for_jobs, HostScheduler.schedule_host_job)


The second path is taken despite _inline_host_acquisition being False if we find an non matched host_jobs [http://shortn/_6iyaG1T1SB]

There are two problems with this:
(1) HostScheduler is not written to be thread safe. Both the host_scheduler and monitor_db processes could try to assign hosts to the same job and step over each other's feet.
(2) The path from monitor_db doesn't exactly reflect the one from HostScheduler.tick. So, it could confuse future assignment of hosts from HostScheduler.


Now, we do have metrics about how often we assign hosts via each path, and it looks like we _never_ take the monitor_db path:http://shortn/_ZUAqxl89mc

There is also supposed to be an email alert whenever we take that path, and there are 0 instances of this alert being raised in either of the email aliases: http://shortn/_ZdDdBqBMLu http://shortn/_v9JEf2egTI

So, I propose we nix this code path.
 
Cc: jrbarnette@chromium.org
Cc: dshi@chromium.org
+dshi, +jrbarnette to catch me before I cause disaster, if I'm reading this incorrectly.
Status: Assigned (was: Started)

Comment 4 by dshi@chromium.org, Dec 16 2016

Just curious, under what condition does the scheduler do the host assignment bypassing host scheduler? If that's the condition we don't need to handle, that sounds like dead code we can remove. If there is some edge case that host scheduler won't handle, we might want to fix that.
Status: Started (was: Assigned)
Continuing from a question I asked the list: http://shortn/_bgviCRsD4W

test_that doesn't (directly) depend on _inline_host_acquisition=True either.
It sets up an in memory DB and then walks through the job life cycle itself. It doesn't use the scheduler at all, so should not be affected by this.

So, there is no good reason to keep supporting _inline_host_acquisition that I know of at this point. I'll try to delete it and see what happens. ;)
moblab is a blocker. It uses inline host acquisition.
No separate host_scheduler process is launched: https://cs.corp.google.com/chromeos_public/src/overlays/project-moblab/chromeos-base/chromeos-bsp-moblab/files/init/


Again, this is pretty easy to change. It'll let us not have to support both.
Project Member

Comment 8 by bugdroid1@chromium.org, Dec 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3c6a3bfd883d9770af66f9e14fa595271774c903

commit 3c6a3bfd883d9770af66f9e14fa595271774c903
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Tue Dec 20 19:29:02 2016

scheduler: Don't schedule host_jobs when inline acquisition is false.

host_scheduler, when run as an independent service assumes that it is
the sole service scheduilng host_jobs. A fallback path in scheduler
would try to schedule these jobs. This situation was never possible in
the normal case, but the fallback was dangerous -- it could leave the DB
corrupted.
Instead, report errors in the fallback path and refuse to schedule the
host jobs.

BUG= chromium:675048 
TEST=unittests, local jobs.

Change-Id: I60bb56ded41c34484f6e281fdd12ace5601e739d
Reviewed-on: https://chromium-review.googlesource.com/422535
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/3c6a3bfd883d9770af66f9e14fa595271774c903/scheduler/monitor_db.py

Labels: -Hotlist-Fixit
Status: Assigned (was: Started)
Turns out this requires pesky fixes for moblab. Like everything else, it's hard to do this for moblab. Not Fixit size.
Status: WontFix (was: Assigned)
I don't care about inline_host_acquisition=False.
I don't.

Sign in to add a comment