New issue
Advanced search Search tips

Issue 1751 link

Starred by 50 users

Issue metadata

Status: WontFix
Owner: ----
Closed: May 2013
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Feature

Blocking:
issue 68358



Sign in to add a comment

Add support for Metalink XML Download Description Format

Reported by anthonyb...@gmail.com, Sep 7 2008

Issue description

Product Version      : 0.2.149.29
URLs (if applicable) : http://www.metalinker.org/
Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
     Safari 3: FAIL
    Firefox 3: FAIL
         IE 7: FAIL

What steps will reproduce the problem?
1. Download a .metalink, such as http://www.metalinker.org/samples/OOo/OOo_2.4.1_Win32Intel_install_en-
US.exe.metalink

What is the expected result?

The download agent parses the XML file and uses a resource URL to start the 
download. The partial file checksums can be used to detect errors during or 
after the download. If a server goes down during the transfer, backup URLs 
can be switched over to. All this is done transparently for the person at 
the browser with no interaction required.

What happens instead?

The .metalink file is downloaded but not processed and the files referred 
to in the .metalink are not downloaded.

Please provide any additional information below. Attach a screenshot if 
possible.

Metalink is currently used by around 35 programs, mostly download programs, 
download managers, FTP clients, & web browsers. It's currently used by 
mostly open source projects like OpenOffice.org, openSUSE, Ubuntu, Fedora, 
cURL, and others. For most people to take advantage of Metalink, it needs 
to be included in browsers. 

Chromium could be the first open source browser to add support. Metalinks 
give more definite information about a download, which should be more 
conducive to searching.

http://en.wikipedia.org/wiki/Metalink
http://www.metalinker.org/

 
OOo_2.4.1_Win32Intel_install_en-US.exe.metalink
20.1 KB View Download
Labels: -Type-Bug -Area-Unknown Type-Feature Area-Compat

Comment 2 by jon@chromium.org, Jan 8 2009

Labels: -Area-Compat Area-BrowserBackend Mstone-X
Status: Available

Comment 3 by mar...@chromium.org, Jan 30 2009

This feature request is fairly involved. First, this requires rewriting the current 
download manager. By doing a quick search for "metalink" at http://aria2.svn.sourceforge.net/viewvc/aria2/trunk/src/ that gives 58 different 
files. (Ok there's probably too many source files in this implementation). That in 
addition to the multipart download logic that needs to be added.

I wouldn't object to a contribution so I'll leave this issue available.
Some of the download programs that support metalink support multipart/multi-source
downloads, but that is not a requirement. (aria2 is the most advanced metalink client).

From what I've read, Chromium already uses libxml2 (which aria2 also uses) for XML
and NSS for checksums. These are the foundations of metalink support. I can't argue
that this feature request is fairly involved. But perhaps it could be approached in
stages, from easiest/simplest and moving on from there.

Stage 1 could include extracting a single FTP or HTTP URL from the metalink XML, then
downloading from that URL (no multipart download, just single source).

Stage 2, checksum the whole file to see if the file has been corrupted.

Stage 3, failover to alternate URLs if a server becomes unreachable.

Stage 4, use chunk checksums to tell if there are errors in a download, and only
re-get the chunks with error so the download can be repaired.

Without getting too insanely complicated, or supporting multipart downloads, by stage
2 you've helped many people, especially those on Windows who don't have native
checksumming tools (md5sum, etc).

Many of the people who download OpenOffice.org, openSUSE, Ubuntu, Fedora, Sabayon,
and other projects that use metalink are on Windows. The first step in dealing with
problems for downloads is usually a manual checksum verification which is a support
nightmare when dealing with inexperienced people. 

Metalink aims to make downloads much easier and to be able to recover from
transmission errors, servers going down, etc, and complete without the user needing
to know anything went wrong.
To parse metalink file, libmetalink(https://launchpad.net/libmetalink) is very handy.
This is a C library and parses metalink file into C structure from which you can
retrieve URLs. libmetalink uses libxml2. Simple example code is included in the archive. 
FYI, mulk project(https://sourceforge.net/projects/mulk) uses libmetalink.

I agree to Anthony that multi-part downloading is not a requirement, rather it is a
optional "bonus" feature for metalink.

Comment 6 by pe...@poeml.de, Jun 18 2009

I second this feature request! Metalinks are extremely helpful when it comes to downloads, and there is no 
alternative technology that achieves something similar. The transparent content verification, and switchover to 
a different mirror in case of network failures and mirror failures is extremely valuable. The larger the 
download, the more indispensible it becomes. It has a potential to improve the usability of the Internet at 
large scale, particularly for users in countries with poor Internet connectivity.

The stages which Anthony gave, in order of importance, make sense. Each is worthwhile in its own. As an 
example, the Firefox extension "DowmThemAll!" (http://www.downthemall.net/) implements stages 1-3 
which is already very helpful. It is implemented in Javascript in surprisingly few lines of code.

Looking at the chromium source code, src/chrome/browser/download seems to be where the download 
manager lives, and src/chrome/browser/dom_ui for the UI. Would these be the principal places to look at?
Is there some plugin or extension api which could be an alternative?

From experience as a content provider (openSUSE project) I would like to add some notes regarding concerns 
that inevitably pop up when familiarizing with metalinks. Technologies called "Download accelerators" 
typically open multiple connections to single servers, which is not considered good netizenship though. While 
metalinks offer the potential to open multiple connections, this is by no means required and a good client-
side default is opening one connection, or two at best. It may be a tunable configuration parameter but 
should be treated carefully and maybe it shouldn't even exposed for the average user. When used at large 
scale, metalinks are most valuable for their failover and content verification features. In addition, it makes 
sense to differentiate between opening multiple connections to single mirrors (which should be avoided) and 
parallel downloading from multiple mirrors (which may be used in a very careful, conservative fashion). The 
latter has an obvious potential of hogging bandwidth to the disadvantage of other users in the same network. 
At openSUSE, we are deploying the use of metalinks for all automated downloads (thousands of clients are 
updating in parallel from about 150 mirrors), and we use a conservative configuration of 2 connections in 
parallel. In my opinion, any implementation in a widely used download client (like a web browser) needs to be 
similarly conservative. -This as a caveat, and also as background to not be put off by possible drawbacks, 
which are not inherent to the metalink specification, but could be abused by implementations.

Comment 7 by pe...@poeml.de, Jun 24 2009

I would like to add these notes that Bram Neijt took when looking for possible places where to hook in: http://groups.google.com/group/metalink-discussion/msg/2c63ced761bc95e6?hl=en

Comment 8 by oritm@chromium.org, Dec 17 2009

Labels: -Area-BrowserBackend Area-Internals
Replacing labels:
   Area-BrowserBackend by Area-Internals

The Internet Draft version of Metalink is in IETF Last Call.

It would be great to have review from anyone, but browser people would be especially
nice.

http://tools.ietf.org/html/draft-bryan-metalink
RFC 5854 'The Metalink Download Description Format' is out.

http://tools.ietf.org/html/rfc5854

Comment 11 by Deleted ...@, Sep 26 2010

Hey:)
Have a look at this excellent site I have got:
<a rel="no follow" href="http://www.shopinguggboot.com"> UGG on sale</a>

<a rel="no follow" href="http://www.fashionuggboots.co.uk">http://www.fashionuggboots.co.uk</a>
<a href="http://www.shopinguggboot.com/" rel="nofollow">http://www.shopinguggboot.com/</a>
This is something that's probably best handled by an extension.  We'd want an download manager extension API that's rich enough to implement something like metalink or bittorrent.  I think this is the current proposal:

http://www.chromium.org/developers/design-documents/extensions/downloads-api

If you're interested in implementing metalink support as an extension, you might want to take a look at that API and email chromium-extensions@chromium.org with any feedback.
Thanks for info about downloads API. I sent some feedbacks.

Comment 14 Deleted

Nobody seems to be mentioning that, if support for Metalink were to be implemented  as a feature in WebKit, it'd be a great, low-overhead alternative to .zip files for things like GMail's "download all attachments" link.

Yes, checksumming and multi-source downloading are important, but we still have no proper download equivalent to HTML5 <input type="file" multiple> and, at the very least, checksumming does have a simpler proposed alternative (Content-MD5 HTTP header).

There are quite a few desktop applications that, if they're to ever be comfortably implemented in the browser, need that kind of functionality and extending the drag-and-drop API won't cut it.

(BIG hassle to HAVE to open a file manager window just to save something. Drag-and-drop is for when you've already got a file manager on hand to use as the source or destination.)
Content-MD5 is fundamentally broken, due to early HTTP specification-makers doing incorrect analogies between HTTP and MIME. If you're going to do any checksumming, use RFC 3230.
Metalink is in Google Summer of Code this year and is looking for a mentor from Chromium to aid one of our students in adding native metalink support. we want to use our own GSoC slot on this.

the student has already written an extension with metalink support.

if anyone is interested, please contact me! :)
Hi! I'm the primary author of the downloads extension API, and I also work on the downloads system in general.
We've been thinking about implementing chunked downloading in C++ in src/content/browser/download at some point, but it isn't on any roadmap.
I'm not sure if I have the cycles to be a mentor. I'll need to talk to my team.
Even if I can't be the mentor, please feel free to contact me directly if you have any questions about the extension API or the C++ downloads system.

Comment 19 by Deleted ...@, Apr 10 2012

Hey, this is Sundaram working on the chrome extension for downloading metalinks. For downloading the file, I use the chrome experimental downloads API. I have a few doubts regarding the flexibility of the API.

I believe that the extensions API has the following set of limitations. Please correct me if I'm wrong with any of this.

1. Metalinks provide the ability to check for errors in the downloaded file. However, I would have to checksum the file using XHR and download the file using experimental downloads API. This means the same file would have to be downloaded twice to check for errors in the data. 

2. Metalinks provide information about multiple mirrors. Thus, multi-sourced downloads are theoretically possible. However, the downloads API does not have options for that.
3. If a particular piece is erroneous, the API does not allow you to download the piece alone from another mirror. 

I don't expect the API to support all of this as it extends only the browser's core functionality.

Basically, what the extension does is really really minimal and we 
would want to expand on it to make use of all the advantages of 
metalinks. 

So at this stage, we can probably look at 1. NPAPI plugin. 2. Native Chromium support.
NPAPI has its share of disadvantages. Plus, it will be an addon at the end of the day. So, native chromium support would be ideal.
THanks
To checksum chunks without downloading them twice, you can write the chunks to an HTML5 FileSystem blob, then download() the blob. This means that the download's URL and referrer are useless unless you maintain your own database and provide your own manager UI that incorporates that database. This is certainly a drawback, but it still seems like a possibility.

I'm not sure what you mean by multi-sourced downloads. The onChanged event contains error information if a download fails; a handler for this event might fallback to a mirror.

So, let's talk about extending content/browser/download to handle metalinks. The issue here is that c/b/d has accrued a significant amount of technical debt. There are some long-standing bugs in c/b/d, and it's difficult to add new features cleanly. It's improved in the last year and a half or so, but it's still a long way from being lean enough that I would be comfortable with adding the significant complexity that metalinks requires. (I haven't talked to my team about this, so I'll update if they feel differently.)
We're planning on starting a massive refactoring effort soon to pay down this technical debt. I hope I'm being pessimistic, but I would be surprised if we finished all the refactoring tasks before summer 2013. Of course, we may reach diminishing returns and decide that c/b/d is lean enough to support additional complexity before then.

The other side of the seesaw is the importance of metalinks. I do not feel like I know how important metalinks is to the web. If every web admin and every other major browser supports metalinks and the user experience is significantly better with metalinks than without, if chrome is holding the web back, then we might be able to afford adding the complexity before/during the refactoring marathon. I understand all the technical advantages of metalinks, and I see that DTA and FlashGot support most of it, but if most users and servers aren't using metalinks, then we have a harder time justifying the additional complexity and engineer-hours. I imagine that the truth is somewhere between those extremes, I just don't know where.

If c/b/d were already in better condition and if I had a better idea of how important metalinks is to the web, then I would probably say "Patches welcome!" As it is, however, the more prudent course of action seems to me to be to try to take the extension API as far as it will go, release the metalinks extension, gauge/drum up interest, and try again for native support in a few months when c/b/d is in better condition and we can see how many users install and use the metalinks extension. (Again, I haven't talked to my team, so I'll update if they feel otherwise.)
> I do not feel like I know how important metalinks is to the web.

This is a perfect example of a hen-egg problem. I also like what metalinks are able to do, e.g. I'm managing the mirrors of a big (.5 gb) open source program and in some parts of the world the internet is so flaky and slow, that you have no chance to get this file just with simple html or ftp. Resume capabilites are important and selecting good mirrors automatically is nice.

BUT: There is also another technology in the wild that is in my eyes a replacement for metalinks. It's the "WebSeed" [ws] feature for torrents. BitTorrent is much more popular and if your client supports webseeds (many do!) you can use them equally well for the same purpose as metalinks. So, if chrome would support everything to create a bittorrent extension with webseeds, you get everything you need. Note: For this OSS project, I also create torrents with webseeds and I have no complaints.

[ws] http://getright.com/seedtorrent.html (but there are other implementations)
Labels: -Mstone-X Feature-Downloads
It is a bit of a chicken-egg problem.
The solution to the actual chicken-egg problem is, of course, the Zen/quantum state of "neither and both": chickens evolved gradually, so there wasn't a 'first' chicken or a 'first' chicken egg that was significantly different from its parent almost-chicken/almost-chicken egg.
The same solution applies here: let's evolve chrome's metalink support one step at a time. Chrome's metalink support may be "complete enough" (enough like a chicken) when a "complete enough" extension is in the webstore, even if the totally complete native implementation (modern chicken) isn't justifiable until much later. The extension may spawn (the justification for) the native implementation.
Ben, thanks for the comments and help. I understand where you're coming from. we appreciate your tips and will work to improve the extension & take it as far as an extension can go! :)

a few questions: 1) how stable is the downloads extension API or when will it be non-experimental? 2) extensions that use experimental APIs can't be in the webstore, right? (it seems that, for users this could be even more perplexing than installing an external download manager - first you'd have to enable the experimental APIs then find it somewhere besides the webstore).

to the importance of metalinks, it is admittedly niche, but inclusion in a browser will solve problems for many people, as far as downloads go. will every site use metalinks? certainly not, but many download sites would like to take advantage of it if minimal features are supported in a mainstream browser. it's supported by around 40 download applications, mostly download managers but some p2p, browser, FTP, and system update utilities (all Red Hat/CentOS/etc systems use it). support in more mainstream apps is coming, thanks to Google Summer of Code too! :) quite a few linux distributions use it for ISO (or more) downloads like Ubuntu, Fedora, & openSUSE. other projects like KDE, LibreOffice, OpenOffice, FSF, Sugar, Xfce, & XBMC use it too.

so you have 7 years of highly technical early adopters

I understand some of the metalink features add complexity, but I too would like to start and stay as simple as possible. the first step is verifying a file's hash is correct after download. this seems like a minimal change and that alone will help many people who are not familiar with hashing files or do not even have reasonable  options (for average people) to even attempt on their OSes. I think the next step would be switching to another mirror if a server went down during download, and the final step could be repairing downloads with the partial file hashes. that's more complex but optional of course, and would help many people as well.

Use cases: 
* downloading large files (error repair) like Linux distributions, software, and games ( https://forums.eveonline.com/default.aspx?g=posts&m=51440 )
* downloading files available on a CDN or mirror network w/ failover if some nodes are unreachable
* webmail could use metalinks for "Download All Attachments" instead of putting all files in a .ZIP archive. 
* downloading a whole album & creating a directory structure instead of putting already compressed files in an archive 
* sites publish mirror/CDN info so caches can recognize more hits properly. 

as far as webseeds go (this doesn't concern Chrome), that's a feature complementary to metalink. BitTorrent is obviously insanely popular for certain things, but there are reasons why no major browser supports it. metalink also offers a way to transparently go from a normal web download to a BitTorrent or p2p download too (without having to click on a torrent). (some of the architects of the web have called it a "bootstrap into a peer-to-peer system" for the web: http://www.w3.org/2001/tag/2012/01/06-minutes#item02 )

and finally, this bug doesn't mention Metalink/HTTP (RFC 6249) which provides mirrors & hashes in HTTP headers:

   Link: <http://www2.example.com/example.ext>; rel=duplicate
   Link: <http://example.com/example.ext.asc>; rel=describedby;
   type="application/pgp-signature"
   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
Yes, extensions using experimental APIs are not allowed in the webstore. We still plan to release the downloads API more or less as spec'd at http://goo.gl/6hO1n in waves, a few features at a time. Some features (e.g. acceptDanger()) will probably change slightly for better security. There is still the possibility that the API will change significantly before it is all released, though I hesitate to guess how likely that might be. I can't guarantee any specific timeline. The entire API may be released by the end of the summer, maybe not.

> quite a few linux distributions use it for ISO (or more) downloads like Ubuntu, Fedora, & openSUSE. other projects like KDE, LibreOffice, OpenOffice, FSF, Sugar, Xfce, & XBMC use it too.

That's what I was looking for. Thanks!

Even checksumming is non-trivial and brings up several sub-feature and interaction design questions. We've been working on resuming interrupted downloads for a while, and that's raised some similar issues, so we have a sense of the size and complexity of the landscapes of sub-feature and interaction design issues surrounding downloads-related features.
Another reason for prototyping features in extensions is that it's *much* easier for extensions to experiment with feature-sets and interaction design than native code, and it's easier for users to vote directly on what features matter in extensions than in native code.

I'm a little confused about the use cases for Metalink/HTTP. I thought that one of the primary benefits of the metalink xml file is that it can be served from one server that might not be the payload server, so that if any payload server goes down or is blocked somehow, the client can still access the metalink xml file and another mirror. But Metalink/HTTP requires the metalink server to be the same as the payload server, so that if that server goes down, the client has no way to find a fall back mirror unless the server goes down after the client begins the download. This drawback might be ameliorated if the Metalink/HTTP response is a redirect to a payload server, but then it seems as if the only difference between metalinks and metalinks/HTTP is the file format. Am I missing something?

Is there a site like imgur or youtube for metalinks? I'm imagining pasting a link or two into a form that generates a metalink file and hosts it at a short url. It could also help users without metalink clients by hosting a small page with a direct link to a random or nearby mirror and maybe voting buttons to signal that a mirror worked or failed. It could also generate a webseeded multi-tracker bittorrent file for those good citizens who use their uplink. This app could understand the mirror structures of a few popular hosting sites like sourceforge and automatically include those mirrors in the metalink. It could link to the best metalink clients for the user's browser/OS. It could be an appengine app.

Another random idea is implementing metalinks(+/HTTP) in a lower level library such as net, so that it might be used for any resource, not just downloads. This might not be useful for small text files such as HTML/CSS/JS, but it might be useful for embedded blob media such as video, audio, images, PDFs, STLs, or large datasets loaded by javascript. CDNs may be interested in some of metalinks' features, though they may have special constraints. Metalinks may also be an easy/redundant mechanism for site-level fail-over.
This is just an idea that I wanted to record somewhere on the off chance that it lights a fire under somebody.

Have you considered whether the webRequest API may be useful? I'm still not very familiar with metalinks, but webRequest is a very powerful API.
http://code.google.com/chrome/extensions/trunk/webRequest.html

As I understand it, the advantage to Metalink/HTTP is that, if you serve up Metalink/HTTP headers in a 200 or 3xx response, you can provide a single link which will seamlessly provide the best possible experience whether or not the user agent supports Metalink... without requiring the user to know or care what Metalink is or whether they have it.

Ben, thanks for the info on the API & your thoughts. good to have that clarified. Sundaram is looking into the webRequest API & your other suggestions.

for interaction design for the extension, we can probably take some cues from what download managers have been doing all these years. usually we keep things simple with options to investigate more for advanced users. if the download completes we don't show anything special, although we could show some type of indicator when the checksum is correct.

you're right about metalink/HTTP, there's reliance on a central server. it also lessens the dependency on XML, so it's optional for chunk checksums. for an organization serving small files with a few mirrors and a whole file checksum, it could be simpler. that would be cool to have support in a lower level library - as you said, it's more general & not as specific on 'downloading a file' so it can be used for any resource or for site-level failover like you say. 

no, there is no imgur like site for metalinks (but some file hosting sites use it). that would be great! I'll share that idea & see what we can come up with. there wouldn't need to be voting on mirrors since the clients will automatically cycle through & not rely on the ones that failed.

Stephan, you can use either the XML or HTTP variety of Metalink to "provide a single link which will seamlessly provide the best possible experience whether or not the user agent supports Metalink... without requiring the user to know or care what Metalink is or whether they have it." I think that's one of the strong points of Metalink!
@27: I'd been trying to figure out how to do fallbacks for non-Metalink user agents with the XML variety off and on (in among other work) for a week but I'd concluded it must have been done using Metalink/HTTP because the only places I could even find mentions of it were old discussions about how one might make it possible.

Could you point me to the relevant documentation?
@28: yes, it requires server side stuff. :) either the client request contains

Accept: application/metalink4+xml

for transparent content negotiation (RFC 2295)...

And/or the server advertises that a metalink is available (using the HTTP Link header field, part of Metalink/HTTP as you mentioned):

Link: <http://example.com/example.ext.meta4>; rel=describedby; type="application/metalink4+xml"

here is a video of it in action:

http://youtu.be/A4-03-Dn4R8
Note that REST purists will surely tell you to use Link and not content negotiation, because the metalink is not semantically an alternative representation of the same resource.
@29: Thanks. That's what I'm looking for. Where should I have been looking and which search keywords should I have been trying in order to find that information myself?

@30: I'm generally very big on ReST, but I'm practical enough to recognize that, if the client is Accept:-ing application/metalink4+xml, it understands the implicit rel=describedby and there's always the chance that some client will support only that approach.

I'll probably use a mix of all three approaches (Metalink/HTTP, Accept, Link) to ensure widest compatibility when I use Metalink.
Metalink Downloader, a Chrome extension, is ready for use! http://code.google.com/p/metalink-chrome-extension/

this is just the 0.1 release, but please try it out & file issues & comments there. there's much more work to do, but this is a great start and I think the best we can do with an experimental extension that can't be in the web store. enable the experimental extension APIs as described on the download page to try it out.

@31 the use of the Link header for transparent metalinks probably isn't spelled out explicitly enough in RFC 6249, we'll have to fix that. TCN is deprecated but some of the mirror redirectors liked it because it meant one less request when they were already doing 40 million a day.
Blocking: chromium:68358
Project Member

Comment 34 by bugdroid1@chromium.org, Mar 10 2013

Labels: -Feature-Downloads -Area-Internals Cr-Internals Cr-UI-Browser-Downloads
Labels: -Pri-2 Pri-3
Status: WontFix
I decided to mark this as WontFix, rather than a low priority available bug.

The litmus test for this is: Even if someone provided a perfect patch for this feature, would we want to take it? In this case, I don't think we would - each new feature adds some amount of code bloat, build and test time cycles, and long-term maintenance. We decided not to support torrent files for a similar reason.

The Chrome download extension API should work for this and it's available in dev channel and beta in Chrome 28. If this is not sufficient, it would be great to get feedback on why that is.

Alternately, users should be able to specify "Always Open Files of this Type" in the download context menu, associate a default program on their OS with the metalink file type, and have a fairly seamless download experience.
@#36:

I don't know about the download extension API (my concern is as a web developer, not a Chrome extension developer), but "Always Open Files of this Type" is useless for Metalink/HTTP since Chrome will just blindly use the HTTP direct download fallback without even realizing that a metalink file full of mirrors is being offered.

Comment 38 by fad...@gmail.com, Feb 27 2016

Metalink support is more vital now than ever with the internet expanding to new areas.  Raise your hand if you've tried to download a linux distribution on an internet line in a developing country.....

If not, I'd like to tell you that getting the OS is often more difficult than using it.  This support could also help with media distribution, and a lot more.  

So, eight years after the first bug report, I implore you to rethink the "wontfix" status.  Metalink could improve the internet experience for billions of people.  

Sign in to add a comment