Canonical URL Issues

There's a potential canonical URL issue that we've not touched on often, if ever. It's the kind of thing that might cause indexing issues or split PageRank into different "piles" - and even, potentially, generate duplicate URL problems.

This canonical problem comes from adding a period to the end of a domain name - http://www.example.com. - and that can trigger a cascade of potential problems. If the trailing period is at the end of the domain name and the site's navigation uses relative urls, then the extra period gets carried forward, and forward, and forward, through succeeding links.

There's a new thread in our Apache Forum that touches on the issue, and it also shares a fix - http://www.webmasterworld.com/apache/3718084.htm As moderator jdMorgan observes, even google.com. has this problem!

This kind of link can be generated innocently enough by forum software that automatically creates links for text strings that look like urls but are at the end of a sentence. And many servers will not have a problem resolving that url with an extra period.

So, for the sake of a complete reference, I'd like to collect the potential canonical url issues all in one place.

Canonical URL Issues

Different domain names serving the same content (302 redirects can make this kind of mess)
Different hostnames within one domain, such as "with-www" and "no-www" versions
With and without "index.html" for the domain root or a subdirectory root
Different protocols - https and http
Trailing period on the domain name
Double forward slash in the file path - http://example.com//page.html
Swapping the order of query string parameters
URL rewrite that allows typos for the "keyworded" virtual directory name
Any forum software or CMS that generates alternate URLs for the same content
URLs that include session parameters, click path tracking, etc.
Adding a port number to the domain name: example.com:443
URLs with unneeded query strings or extra parameters in the query string

0 comments: