Audit all TKO tables for AUTO_INCREMENT hitting ceiling |
||||||
Issue descriptionWe had an outage this week due to an AUTO_INCREMENT column hitting its limit (issue 877917). This bug is a follow up. - Audit all TKO tables to figure out if we're close to hitting any other AUTO_INCREMENT limit. - Consider adding an alert in the prod-TKO instance for this condition on various columns
,
Sep 11
,
Sep 11
Through an application of magic devised here: http://code.openark.org/blog/mysql/checking-for-auto_increment-capacity-with-single-query, I can safely say that only one table has any chance whatsoever of hitting the ceiling, which is tko_job_keyvals. tko_job_keyvals.id is at 43% of id capacity, tko_tests.test_idx is at 20% of capacity, tko_jobs.job_idx and tko_task_references.id are nearing 5% of capacity. All others register as 00.00% of ids taken. I'm only ~80% certain that this means tko_job_keyvals is in no danger; handing to Richard to confirm, hand it back to me if there's any work to be done.
,
Sep 11
The nominal goal is along the lines of "we'll be dead (or at least retired) before this is a problem." A more objective measure would be "good for at least 50 years, even at a significant rate of growth." None of these tables is more than 10 years old, so if we assume linear growth, 40% used in a 10 year space means we have 15 years to go. But "assume linear growth" is unrealistic (quadratic or exponential is more like it). And you, at least, should still be working 15 years from now. So, I'd recommend we grow the key size for all four of the columns with a non-zero "AUTO_INCREMENT_RATIO".
,
Sep 11
Last I heard, the TKO has less than five years to live. (And probably less than *one*.) Expanding TKO tables to cover a 10-year span, let alone 50 years, is a definite YAGNI.
,
Sep 11
> Last I heard, the TKO has less than five years to live. ? This is news to me. If it's true, then yes, the current schema probably doesn't need revisiting.
,
Sep 11
I may have misunderstood, but I thought we were abandoning it as a later step of the move to Skylab.
,
Sep 17
Need to know which tables will die in how long (TKO Will be around for > 5 years), so that we can decide the order in which to handle them.
,
Sep 24
Want a table of expected table death on the bug. Punt until next week.
,
Sep 24
Results: tko_jobs and tko_tests, by started_time with NULLs ignored: tko_jobs job_idx 168499490 at Unix time 1521788401 job_idx 211105961 at Unix time 1537829343 2.656 idx per second estimated time of death: 2067 tko_tests test_idx 856946024 at Unix time 1537830425 test_idx 700695476 at Unix time 1522220400 10.01 idx per second estimated time of death: 2029 tko_task_references and tko_job_keyvals, by the two-week span since #c3; much lower confidence estimate tko_task_references rate: 3.08 per second estimated time of death: 2040 tko_job_keyvals rate: 63.8 per second estimated time of death: April 2019 So tko_job_keyvals is P1, tko_tests is P2, tko_jobs and tko_task_references are P3+
,
Oct 1
Ok, so the most forseeable failure is the recurrence of the tko_job_keyvals failure: tko_job_keyvals rate: 63.8 per second estimated time of death: April 2019 The solution there is to make it a 64bit key, which we can do with no downtime. Will file a separate chase bug for that and link here. Removing chase here locally.
,
Oct 1
Follow up action at Issue 890970 |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by jrbarnette@chromium.org
, Aug 27