New issue
Advanced search Search tips

Issue 921236 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Process small batches of data synchronously in add_histograms

Project Member Reported by benjhayden@chromium.org, Jan 11

Issue description

HistogramSet JSON is posted to /add_histograms, which writes the data to cloudstorage and posts a task to /add_histograms/process, which reads the data from cloudstorage and dispatches many smaller /add_histograms_queue tasks.

These task queue tasks are necessary for large batches of data that cannot be processed within the 60 second request timeout, but they effectively hide any errors processing the data from the caller. Task queue tasks get a 10 minute timeout.

For small batches of data, the timeout is unlikely to fire, so add_histograms could process the data synchronously. This would surface any errors to the caller. There's already a pathway in add_histograms that processes data synchronously, which is invoked when running in dev_appserver in order to surface errors to the caller. This bug suggests taking that sync path when the data is "small".

Defining "small" could be tricky. A good proxy for processing time is the number of histograms. The length of the json might be a good enough proxy for that, but loads()ing the json probably won't take 60 seconds and would provide an accurate measure of the number of histograms.

If it turns out that loads() can take more than 60s, then add_histograms could fall back to the same code path that it would take if there were too many histograms to process in 60s: write the json to cloudstorage and post a task to /add_histograms/process.

This sync path would incentivize callers to batch their data before uploading, which would be more efficient for the dashboard anyway.
 

Sign in to add a comment