sql::Connection auto recovery support |
|
Issue descriptionCurrently there is no a well-established practice of using sql::Recovery in Chromium for every DB. This is sad, because there is currently a lot of DBs that have SQLITE_CORRUPT errors in production. Also, the recoverability of every new DB that is created in Chromium is solely depends on the knowing of a developer about the sql::Recovery helpers. There were some attempts to solve the problem locally for LoginDatabase but they didn't make it to the master: https://chromium-review.googlesource.com/c/chromium/src/+/528139 https://chromium-review.googlesource.com/c/chromium/src/+/565112 I would like to implement a generic auto recovery logic that would be accesible via sql::Connection interface, could work automatically without a custom error handler and could be integrated with existing error handlers that do the recovery manually. The idea looks like this: 1. Introduce an enum with a recovery policy: enum class RecoveryPolicy { kNone, // Does nothing. kRaze, // Does Raze. kRecover, // Executes sql::RecoverDatabase, fallbacks to Raze if failed. kRecoverWithMeta, // Executes sql::RecoverDatabaseWithMetaVersion, fallbacks // to Raze if failed. }; 2. Add RecoveryPolicy as an additional parameter into sql::Connection::Open: bool Open(const base::FilePath& path, RecoveryPolicy recovery_policy); 3. Add a call to the recovery logic in sql::Connection::OnSqliteError with some restrictions: a) If error_callback_ is set, try to do the recovery after a callback run. b) Do the recovery only if a DB is not poisoned, is opened and we're at the first-level error in the call stack. 4. The recovery logic will not poison the DB if it was recovered on Open (i.e. retry_flag == RETRY_ON_POISON), otherwise it will always poison the DB. This will prevent a sql::Connection user from DB bugs if it's not ready to handle an unexpected data change. Optionally we can implement a on_poisoned/recovered_event callback so user can reinitialize the connection and continue to work in the current session. pwnall@ what do you think? Can we do this? Maybe you have your own thoughts on this topic? If the idea looks good in general, I can make a CL and we can discuss details there.
,
Jul 28
Sorry I wasn't able to respond earlier. I was at offsites for the past few of days. Immediate logistic troubles aside, I really like the overall idea of standardizing recovery. The timing is pretty fortunate, as well -- I've recently had time to finally go through some of the code in //sql, and I have a better idea of what's going on. Last year, I had just taken over from shess@. At a lower level, I don't think the proposed API meshes well with the current design of sql::Connection. The general approach for configuring a sql::Connection is setters that must be called before Open, not making Open take a lot of arguments. I don't particularly like the approach I've just described either -- it seems to bundle a builder together with the built class. I hope you're fine with changes at this level of detail. Before unifying recovery for all //sql users, I would like to understand the current situation. Would you happen to have the time and willingness to put together an inventory of all //sql users and their current strategies for recovery?
,
Aug 6
Hey, @pwnall. I've looked at all classes that do set_error_callback() on sql::Database (previously sql::Connection). Here is the result: chrome/browser/extensions/activity_log/activity_database.cc Checks for sql::IsErrorCatastrophic() and does RazeAndClose(). components/blacklist/opt_out_blacklist/sql/opt_out_store_sql.cc Checks sql::Recovery::ShouldRecover() and follows a standard recovery procedure without meta table: 1. Resets error callback. 2. Runs sql::Recovery::RecoverDatabase. 3. Calls sql::Database::IsExpectedSqliteError(). components/history/core/browser/history_backend.cc Here we have something a little more complicated. First it checks for sql::IsErrorCatastrophic() and then it does PostTask which Razes and then Closes the DB with some intermediate logic to notify its clients about the DB is being closed. components/history/core/browser/thumbnail_database.cc Generates some diagnostics for every error. Checks sql::Recovery::ShouldRecover() and follows a standard recovery procedure with meta table. components/history/core/browser/top_sites_database.cc Checks sql::Recovery::ShouldRecover() and does a MANUAL recovery procedure which includes: 1. Call to sql::Recovery::BeginRecoverDatabase. 2. Meta table version check. 3. Database content check/update/remove if some logical constraints are broken. components/omnibox/browser/shortcuts_database.cc Checks sql::Recovery::ShouldRecover() and follows a standard recovery procedure without meta table, but *meta table exists*. components/password_manager/core/browser/android_affiliation/affiliation_database.cc Checks for sql::IsErrorCatastrophic() and does RazeAndClose(). components/password_manager/core/browser/login_database.cc Doesn't have any recovery logic, just prints a log message on SQLITE_CONSTRAINT error. components/sync/syncable/directory_backing_store.cc Doesn't have any recovery logic, generates only a log message (see LoggingUnrecoverableErrorHandler in sync_client.cc) components/webdata/common/web_database_backend.cc Checks for sql::IsErrorCatastrophic() and does RazeAndClose(). content/browser/appcache/appcache_database.cc Detects corruption using sql::IsErrorCatastrophic(), generates metric and closes DB (no recovery logic). net/extras/sqlite/sqlite_channel_id_store.cc Checks for sql::IsErrorCatastrophic() and does RazeAndClose() via PostTask. net/extras/sqlite/sqlite_persistent_cookie_store.cc Checks for sql::IsErrorCatastrophic() and does RazeAndClose() via PostTask (same as previous). Summary: Only one class does a manual recovery via sql::Recovery::BeginRecoverDatabase with a content check. Some users do RazeAndClose, some do sql::Recovery::RecoverDatabase[WithMetaVersion]. Some users do an error handling via PostTask. Nit: sql::IsErrorCatastrophic() is basically mirrors sql::Recovery::ShouldRecover(). I think we can remove one of these. |
|
►
Sign in to add a comment |
|
Comment 1 by se...@yandex-team.ru
, Jul 27