Shards SSH into devservers per RPC |
||||
Issue descriptionAccording to Richard, the shards ssh into devservers on a per-rpc basis. 1) This appears like something that would be flake prone (no numbers). 2) This appears like something that would hurt performance (no numbers). 3) This makes it impossible (on the devserver) to see which shard generated a request. Even if the system is 100% robust, this will introduce a 1-2 second delay into every single shard -> devserver RPC call. With some development, we could replace this system with SSH port forwards. Post development churn, I would expect that to speed things up a lot, and to improve our ability to distinguish error causes on the shard. This leads to two questions. A) Is the performance improvement worth the development cost / stability churn? B) Is there a robustness issue here?
,
Jun 20 2017
^ I'm not convinced this fits the Chase-Pending criteria (outage preventing, short scope) but I'll let you make your case in meeting. Adding xixuan who worked on the ssh-wrapped-curl rpcs. Assigned to dgarrett as part of devserver deep dive.
,
Jun 26 2017
,
Jul 11 2017
> 1) This appears like something that would be flake prone (no numbers). History to date hasn't demonstrated any flake associated with this implementation. I'm not prepared to say "this will never be flaky". I'm just saying "hasn't happened yet." > 2) This appears like something that would hurt performance (no numbers). The implementation is definitely slower than other things we could do. However, the calls in question are relatively low volume for the operations they're part of. A common case for these calls is during provisioning. There's only a handful of devserver calls on performance critical paths, so I'd expect the savings we'd get to be a few seconds saved out of an operation that takes 10 minutes to complete. > 3) This makes it impossible (on the devserver) to see which shard generated a request. True. To date, we haven't had a debug problem where this was an important consideration. However, as a practical matter, I'm not sure that this problem can be fixed: Our available alternative to the current implementation is to create an ssh tunnel to the devserver. I'd expect that connecting through a tunnel would still appear to apache as a connection from localhost. If 1) or 2) were significant enough, we could switch to using a tunnel. However, creating and managing the tunnel processes poses its own challenges and risks, and it's harder to do than the current implementation. The certainty of added cost with no demonstrated need or benefit is the reason we've left the implementation as-is up to now.
,
Sep 26 2017
|
||||
►
Sign in to add a comment |
||||
Comment 1 by ayatane@chromium.org
, Jun 20 2017