February 14, 2009

Search engines to support 'canonical urls', or: how to reinvent the wheel

Google, Yahoo and Microsoft all announced support for a so-called canonical url's, allowing us developers to avoid duplication in search engines.

Great feature, but it seems that we already had a standard for it.

Web mentions

Comments

Simon Harris • Feb 13, 2009
That's interesting, thanks for raising this. I wonder if it can be applied to the issue of engines listing either of two versions of sites i.e. with and without the www dot.

Incidentally, my (tenuous) understanding of the document makes me think that the Content-Location header doesn't provide a standard for specifying canonical URLs, as:

"The Content-Location value is not a replacement for the original requested URI; it is only a statement of the location of the resource corresponding to this particular entity at the time of the request"

That said, it's fairly obtusely worded and, to my mind specifying a canonical URL does belong at the HTTP level, rather than, say, Google simply inventing yet more hacks to HTML.
Vahur • Feb 14, 2009
Nice to know, i hope that it has an impact to pagerank and SERP also.
Simon Reinhardt • Feb 15, 2009
I wouldn't say that Content-Location is quite the same. It is mostly used for content negotiation, linking from a generic resource to a specific resource: http://www.w3.org/DesignIssues/Generic.html (and therefore not re-usable if you already do content negotiation since it will contain different URIs depending on the request).
But that's not really what you want to achieve here, rather you'd want to link in the other direction so that you generic resource appears in the search results.
I think however that they could've just used base URIs. Using them for that purpose doesn't seem to contradict their semantics in my opinion. And they're just as easy to implement in HTML: http://www.w3.org/TR/html401/struct/links.html#h-12.4
They only downside I can see with that is that the base URI has to be absolute - rel="canonical" allows the user to use relative URIs.
Content-Location sets the base URI as well, btw. Although I think that this has quite the opposite effect to merging URLs, especially if you use content negotiation.