New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 658746 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Add metrics to track abandoned downloads in downloads home

Project Member Reported by rachelis@google.com, Oct 24 2016

Issue description

Need clarification about what an "abandoned download" is.

Comment 2 by dah...@chromium.org, Oct 24 2016

I clarified in the doc. An abandoned download is one that has been paused, but never resumed. Presumably we would want to clear that from the UI after some point. However, we don't want to pre-optimize and build a big feature around that until we know this is an issue. Does that make sense?
Need more clarification on "never".  After what length of time should a download contribute to the metric?

Engineering notes:
Need to be careful about when this metric gets recorded; if we record it when Download Home is opened, the user could potentially decide to resume it at that point.  Probably want to record it when Download Home closes, or once per day or something.

Comment 4 by dah...@chromium.org, Oct 24 2016

What if we did DaysSincePausedWithoutResumption histogram? That way we don't have to set a binary cutoff.
Cc: asanka@chromium.org
1) You'd have to worry about one download filling up multiple slots if you're looking at 7DA or higher metrics, then: if we track how many days something has been sitting there, then an ignored download would register into all the buckets from 0-6 over the course of a week, inflating numbers substantially.

2) That metric also doesn't really tell us anything about a _specific_ user's behavior; I'd just picture that the metric would show a long tail that gets fatter and fatter.

3) Frontend code shouldn't be making decisions about whether to show an abandoned download‒UI is just supposed to show whatever the state of the backend is.  If anything, the backend should be tracking this state and deleting abandoned downloads properly (in the future) so that it grants consistent behavior with desktop.  (+asanka for comments on this one)

Comment 6 by dah...@chromium.org, Oct 24 2016

1.) Yes, this metric wouldn't be something that you could aggregate over multiple days. Is there any precedent for that type of metric in Chrome? BTW, the same aggregation problem is going to happen even if we choose a binary number, won't it?

2.) I think you are correct about the shape of the distribution. If I think the analysis we would like to do is to say X% of the population has a download over Y weeks old. We can make our cutoff where X is reasonably small (<5%) and Y is reasonably large (>2 weeks). One alternative is to simplify the above idea into count of users who have a paused download older than 2 weeks, 3 weeks, 4 weeks, 5, weeks, 6+ weeks.

I can definitely see that this would be a backend feature more than a frontend feature. The only question is whether we would want to add UI to inform the user that the file is going to be deleted. I think the answer is no, but I just want to think through it more.

1) Yeah, same problem would happen.  Don't personally know of any metrics where we track age for each individual object.  Another weird metric is number of tabs a user has open whenever the user goes to the background, which would cause a user to be bucketed multiple times in potentially multiple places (even for the same day): Tab.TotalTabCount.BeforeLeavingApp.

2) A different pair of metrics you might consider is "number of days since a download was resumed" and "number of paused downloads".  That way you'll see if anyone actually bothers to resume something past a certain number of days (or weeks).

Comment 8 by dah...@chromium.org, Oct 24 2016

I love the idea of tracking the latest time when someone actually resumes a file! Combining that with the first data gives us a real idea of where the cutoff should be. 

Let me get some feedback from the Chrome metrics team on the first metric and see if they have some suggestions.

Comment 9 by asanka@chromium.org, Oct 24 2016

#7: This one is tricky to measure. Usually you can track the size of a container via metrics if you make a measurement each time you increment the size. See Download.HistorySize2 for an example. You can then post-process the result to get at a useful distribution of sizes accounting for the "streaking" caused by a single entity touching multiple buckets along the way to its latest value.

I'd be interested in what the metrics folks have to say. The method used in HistorySize2 is pretty dated at this point, and I'm sure other teams have tried to measure similar things.

I also agree that cleanup should be something handled by the core. I'm happy to take a whack at the cleanup aspect of this bug.

One thing to note is that the core won't care about removing stale entries from history, since that's what the history is supposed to contain. The focus would be more on making sure there all the .crdownload files on disk are accounted for and the growth of these files is capped.

Trying to come up with a hard limit on stale files is hard and it's very likely that the age of the file is not what we'll end up using. Limits are going to be very specific to an individual. It's perfectly fine for someone to start downloading something, and then resume it 2 weeks later when they are back on a free wifi, for example. Capping the age at 7 days or so would unnecessarily discard this person's download even if there was plenty of storage to spare and the temporary file wasn't causing any harm.

More likely, we should try to discard older files as necessary to make way for new files.
Cc: mpear...@chromium.org
Labels: -Pri-3 Pri-2
Owner: asanka@chromium.org
Status: Assigned (was: Untriaged)
Based on the thread with Chrome-metrics, here is a proposal for how we should capture this information:

-We collect the histogram at startup of each Chrome session.
-The histogram captures the time delta between when the download was paused and when the metric was collected.
-We bucket the data in days. I think 30 days is enough (e.g., 1-29, 30+). Any reason why we should go longer?
-The count is the number of paused files that fall into the bucket.

Its OK that aggregation of the counts is meaningless, but it isn't really necessary because we get the full distribution every day. Aggregation of users is meaningful.

We should also capture the age of a resumed file (TimeFileResumed-TimeFilePaused) whenever a download is resumed.

Asanka@ does this make sense? Can you add this to your backlog? As mpearson suggested, we should add him to CL to review the XML.

By "Paused" are you referring to "interrupted" downloads? Also did you see #9?

Either way if you are trying to figure out a cutoff for how long to keep an interrupted downloads in download home, then we could measure user engagement with downloads in downloads home. I.e. measure the age of downloads that the user touches in DH. This way you can gauge the drop-off in user interest as a download ages, interrupted or not. This will be affected by the ordering of downloads in DH, but at least you'll get some signal about user interest.

Some interrupted downloads are in a terminal state (e.g. an attempt to download something from a nonexistent host or a nonexistent URL). These cannot be resumed. Not a big deal, but something to keep in mind when trying to measure user engagement.

My interest here is to improve space usage for interrupted downloads. For that the metrics mentioned in #10 aren't that useful. The count of downloads is interesting, but discarding an interrupted download with no partial file is not the same as discarding one with 50MB of partial data. The end goal of download resumption is to minimize data waste. That usually boils down to a trade-off between data on the wire vs. data on disk. To help us make that trade-off we can measure the net progress in bytes for a resumed download vs. age of partial state. This is a straightforward measurement since it aggregates along the age axis without duplication and would give us an idea of how the usefulness of partial state drops off with age.

There are other ways that we can optimize disk usage including the size of the history DB, but I don't think that discussion is useful here.

Owner: ----
Status: Untriaged (was: Assigned)
Status: WontFix (was: Untriaged)
We are undergoing a larger telemetry update and we will add any necessary data there.

Sign in to add a comment