New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 735118 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Shards SSH into devservers per RPC

Project Member Reported by dgarr...@chromium.org, Jun 20 2017

Issue description

According to Richard, the shards ssh into devservers on a per-rpc basis.

1) This appears like something that would be flake prone (no numbers).
2) This appears like something that would hurt performance (no numbers).
3) This makes it impossible (on the devserver) to see which shard generated a request.

Even if the system is 100% robust, this will introduce a 1-2 second delay into every single shard -> devserver RPC call.

With some development, we could replace this system with SSH port forwards. Post development churn, I would expect that to speed things up a lot, and to improve our ability to distinguish error causes on the shard.

This leads to two questions.

A) Is the performance improvement worth the development cost / stability churn?
B) Is there a robustness issue here?
 
 
Labels: Chase-Pending
Cc: xixuan@chromium.org
Owner: dgarr...@chromium.org
Status: Assigned (was: Untriaged)
^ I'm not convinced this fits the Chase-Pending criteria (outage preventing, short scope) but I'll let you make your case in meeting.

Adding xixuan who worked on the ssh-wrapped-curl rpcs.

Assigned to dgarrett as part of devserver deep dive.
Labels: -Chase-Pending
> 1) This appears like something that would be flake prone (no numbers).

History to date hasn't demonstrated any flake associated with
this implementation.  I'm not prepared to say "this will never
be flaky".  I'm just saying "hasn't happened yet."


> 2) This appears like something that would hurt performance (no numbers).

The implementation is definitely slower than other things we could
do.  However, the calls in question are relatively low volume for
the operations they're part of.  A common case for these calls is
during provisioning.  There's only a handful of devserver calls on
performance critical paths, so I'd expect the savings we'd get to be
a few seconds saved out of an operation that takes 10 minutes to complete.


> 3) This makes it impossible (on the devserver) to see which shard generated a request.

True.  To date, we haven't had a debug problem where this was
an important consideration.  However, as a practical matter,
I'm not sure that this problem can be fixed:  Our available
alternative to the current implementation is to create an ssh
tunnel to the devserver.  I'd expect that connecting through
a tunnel would still appear to apache as a connection from
localhost.

If 1) or 2) were significant enough, we could switch to using a
tunnel.  However, creating and managing the tunnel processes poses
its own challenges and risks, and it's harder to do than the current
implementation.  The certainty of added cost with no demonstrated
need or benefit is the reason we've left the implementation as-is
up to now.

Status: WontFix (was: Assigned)

Sign in to add a comment