URL Referrer Tracking

There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this way can result in significant issues with search engines. In particular, it can cause duplicate content issues (since the search engine bot finds multiple valid URLs that point to the same page) and ranking issues (since all the links to the page aren't to the same URL).

Let's say that Jane and Robot uploaded two different online training seminars to YouTube as part of a viral marketing effort to drive more traffic to our site. To gauge our return on investment from each of these seminars, we've added a tracking parameter to the link within each YouTube description that a customer can click on to learn more, here are the two URLS: http://janeandrobot.com/?from=promo-seminar-1 and http://janeandrobot.com/?from=promo-seminar-2. Each would bring the customer to our home page (the same page served by http://janeandrobot.com) and we would track the conversions based on the from parameter in the URL.

While this solution may seem to work well initially, it can result in low quality tracking data and impact our search acquisition. Here's a summary of the major problems:

1.

Duplicate content - search engines sometimes have difficulty determining if two URLs contain the exact same page (see canonicalization for more information). In this case, we're creating this problem because we've created multiple URLs for the same page. Search engines are likely to find all three URLs for the home page and store/ rank them as separate content within their index. This could cause the search engine robots to crawl the page three times instead of just once (which may not be a big deal if we are only tracking two promotions, but could become a big problem if we used similar tracking parameters for many other campaigns and URLs). Not only are the robots using more bandwidth than is necessary, but since they don't crawl a site infinitely, they could spend all the allotted time crawling duplicate pages and never get to some of the good unique pages on the site.
2.

Ranking - search engines use the number of quality links pointing to a URL as a major signal in determining the authority and usefulness of that content. Because we now have three different URLs pointing to the same page, people have three choices when linking to it. The result is a lower rank for all of the variations of the URL. Search engines generally filter out duplicates, so for instance, if the original (canonical) home page has 100 incoming links and each URL with a tracking parameter has 25 links, then search engines might filter out the two URLs with fewer links and show only the canonical URL, ranking it at position eight for a particular query based on those 100 incoming links. If all incoming links were to the same URL, then search engines would count 150 links to the home page and might rank it at position three for that same query.

Another danger is that if one of the YouTube promo videos becomes exceptionally popular, its promo URL might gain more links than the original home page URL. Using this same example, if one of the promo URLs gained 200 links, search engines might choose to display it in the search results over the original home page. This could cause a confusing experience for potential customers who are looking for your home page (http://janeandrobot.com/?from=promo-seminar-1 doesn't look like a home page and searchers might be less likely to click on it, thinking it's not the page they're looking for). It's also not ideal from a branding perspective.
3. Reporting quality - as social networking sites become more popular, we become more of a sharing culture online. Many people use bookmarks, and online bookmarking sites such as Delicious, email, and other sharing sites such as Facebook, Twitter, and FriendFeed to save and share URLs. They'll click on on a URL, and if they like it, copy and paste it from the browser's address bar. If the link they're saving/sharing happens to be one of our promotional links, then they have preserved this link for all time, and everyone who clicks through the link will look identical to someone coming through the promo. This skews the reporting numbers of who went to the site after viewing the video -- which was why we set up the tracking parameters in the first place!

Implementation Options

Unfortunately there is no perfect solution for this scenario, and what works best for you depends on your infrastructure and situation. Here we've listed several common solutions that you can choose from to improve your own implementation. We generally recommend the first solution (Redirects), but there are pros and cons to each option that you should review carefully before making your decision.
Redirects (and Cookies)

The first option strives to solve the problem by trapping all of the promotional requests, recording the tracking information, then removing the tracking parameter from the URL. This can be time consuming to implement, but it is the best all-round scenario to address the three major issues listed above.

If you wanted to get fancy, and track a user's entire session based on your referral parameter, then you can use this method as well and simply set a cookie on the client machine at the same time you trap the request. This is recommended to understand the value of traffic from different sources. In either case, here are the steps you'll need to undertake:

1. Trap the incoming request - find where you web site application's logic processes the HTTP request for your page. Trap each request at that point and check if it has a tracking parameter. If it does, record this in your internal referral tracking system. You can record this either in your server logs, or in a custom referral tracking database you maintain on your own.

* If you also would like to track the entire user's session, then you should also use this opportunity to set a cookie on the client.

2. Implement the redirect - next step is to implement a 301 redirect from the current URL to the same page without the tracking parameter (or the canonical URL). Don't for get to use the cache-control attribute in the HTTP header to ensure that all the requests come to your server and don't get handled automatically in some network-based cache. Here's what a sample redirect header might look like:

301 Moved Permanently

Cache-Control: max-age=0

Note that ASP.Net and IIS both use 302 redirects by default, so you many need to manually create the 301 response code.

The way this works is that when a search engine encounters a promotional URL (http://janeandrobot.com/?from=promo-seminar-1) it issues an HTTP GET request to the URL. The HTTP response tells the search engine that this page has been permanently moved (301 Redirect) and provides the new address (the same as the old address but without the tracking parameter). The search engine then discards the first URL (with the tracking code) and only stores the second URL (without the tracking code). And everything is right in the world.

This implementation is one of the best options, but it does have some limitations:

* One downside of this method is that it requires you to manage your own referral tracking system. Because it traps the referral parameters and removes them from the URL before the page actually loads, 3rd party referral tracking applications like Google Analytics, Omniture, WebTrends or Microsoft adCenter Analytics will not be able to track these referrals.

URL Fragment

A simple and elegant option is to simply place the tracking parameter behind a hash mark in the URL, creating a URL fragment. Traditionally, these are used to denote links within a page, and are ignored completely by search engines. In fact, they simply truncate the URL fragment from the URL.

Old URL

* http://janeandrobot.com/?from=promo-seminar-1
* http://janeandrobot.com/?from=promo-seminar-2

New URL with URL Fragment

* http://janeandrobot.com/#from=promo-seminar-1
* http://janeandrobot.com/#from=promo-seminar-1

By default Google Analytics will ignore the fragment as well, however there is a simple work around that was provided to us by Avinash Kaushik, Google's web metrics evangelist. Using the following JavaScript:

var pageTracker = _gat._getTracker("UA-12345-1");



// Solution for domain level only

pageTracker._trackPageview(document.location.pathname + "/" + document.location.hash);



// If you have a path included in the URL as well

pageTracker._trackPageview(document.location.pathname + document.location.search +

"/" + document.location.hash);

You can create a few additional variations of this if you also have additional queries in the URL you would like to track. Check with your web analytics provider to find out if you need to customize your implementation to account for using URL fragments for tracking.

Does this sound too simple and easy to be true? There are a couple downsides to this approach:

* This option fixes issues 1 (duplicate content) & 2 (ranking) listed above, but it will not address the 3rd issue of reporting. You could still encounter some reporting issues using this method if people are bookmarking or emailing around the URL.
* Typically you'll have to write some custom code to parse the URL fragment. Since it's a non-standard implementation, standard methods many not support this.

Robots Exclusion Protocol

Another relatively simple solution is to use robots.txt to ensure that search engines are not indexing URLs that contain tracking parameters. This method enables you to ensure that the original (canonical) version of the URL is always the one indexed and avoids the duplicate content issues involving indexing and bandwidth.

Assuming that all of our tracking parameters will follow a similar pattern to this:

http://janeandrobot.com/?from=

we can easily create a pattern that will match for this. Below is a robots.txt file that implements the pattern:

# Sample Robots.txt file, single query parameter

User-agent: *

Disallow: /?from=

The first line means that this rule should apply to all search engines (or robots crawling your site), and the second line tells them that they can't index any URLs that start with 'janeandrobot.com/?from=' and some type of promotional code of any length. See complete information on using the Robots Exclusion Protocol. Use this pattern if you will have multiple query parameters:

# Sample Robots.txt file, multiple query parameters

User-agent: *

Disallow: /*from=

Once you've implemented the pattern appropriate for your site, you can easily check to see if it is working correctly by using the Google Webmaster Tools robots.txt analysis tool. It enables you to test specific URLs against a test robots.txt file. Note that although this tool tests GoogleBot specifically, all the major search engines support the same pattern matching rules. In Google Webmaster Tools:

1. Add the site, then click Tools > Analyze robots.txt. (Unlike most features in Google Webmaster Tools, you don't need to verify ownership of the site to use the robots.txt analysis tool). The tool displays the current robots.txt file.
2. Modify this file with the Disallow line for the tracking parameter. (If the site doesn't yet have a robots.txt file, you'll need to copy in both the User-agent and Disallow lines.)
3. In the Test URLs box, add a couple of the URLs you want to block. Also add a few URLs you do want indexed (such as the original version of the URL that you're adding tracking parameters to).
4. Click Check. The tool displays how Googlebot would interpret the robots.txt file and if each URL you are testing would be blocked or allowed.

At this point you may be thinking, wow, I can do all this and not have to write any new code? Unfortunately, there are even more downsides to this approach than the others:

* This option will fix issue 1 (duplicate content), but not issues 2 (ranking) and 3 (reporting). This can be a good interim solution while you're implementing the more complete redirects solution, but it often isn't useful enough on its own.
* Likely this will take a little bit of extra testing to ensure you get the patterns correct in your robots.txt file and don't inadvertently block content you want indexed.

Yahoo Site Explorer

Yahoo provides an online tool designed to solve this scenario. However, the solution only helps with Yahoo search traffic. To use the Yahoo fix, simply go to http://siteexplorer.search.yahoo.com and create an account for your web site in the Yahoo Site Explorer tool. Once you've verified ownership of your web site, you can use their Dynamic URL Rewriting tool to indicate which parameters in your URLs Yahoo should ignore.

image

Simply specify the name of the parameter you use for referral tracking (in our example it is 'from'), and set the action 'Remove from URLs'. Yahoo will then remove that parameter from all of your URLs while processing them and give you a handy little report about how many URLs where impacted.

Again, this is another solution that seems too easy to be true, but again, there are some significant limitations with this approach:

* At the end of the day this is still a Yahoo-only solution. With approximately 20% market share, it is likely this will not meet all of your needs. However, if you do get some percentage of your traffic from Yahoo, there is no harm in doing this in the short term while you implement another method in the longer term.
* The other problem with this solution is that it doesn't solve issue #3 (reporting), so you are still susceptible to reporting errors due to folks bookmarking and emailing your URLs with tracking codes.

Common Pitfalls
Cloaking & Conditional Redirects

Some web sites and SEO consultants attempt to solve this by a technique called cloaking or conditional redirects. Essentially what these methods do is check if the HTTP GET request is coming from a search engine and then show them something different than normal users see. This something different could be a simple 301 redirect back to the page without the tracking parameter similar to our first solution above. The difference is that our solution implemented this redirect for all requesters, and cloaking/ conditional redirects implement it only for search engines.

The big problem with this implementation method is that cloaking and conditional redirects are explicitly prohibited in the webmaster guidelines for Google, Yahoo and Live Search. If you use this method, you risk your pages being penalized or banned by the search engines. The primary reason they prohibit this behaviors is because they want to know exactly what content they are presenting searchers using their service. When a web site shows something different to a search engine robot than to a general user, a search engine can never be sure what the user will see when they go to the web site. So, even if you're thinking of implementing cloaking for what seems to be a valid, and not deceptive, reason, it's still a technique search engines strongly discourage.

This leads to the second major problem with this implementation method - it adds significant complication and can be difficult to monitor whether or not it's working - e.g. you have to test it pretending to be each of the 3 search engines robots. When things go wrong, it is likely that you're not going to see it right away, and by the time you do, your search engine traffic may already be impacted. Check out this example when Nike ran into an issue with cloaking.
Crazy Tracking Codes

Many studies on the web that show customers prefer short, understandable URLs over long complicated ones, and are more likely to click on them in the search results. In addition, users prefer descriptive keywords in URLs. Therefore, it might be worth your time to spend a few extra minutes thinking about the tracking codes you use to see if you can make them friendlier.

Good examples

* ?from=promo
* ?from=developer-video
* ?partner=a768sdf129

Bad examples

* ?i=A768SDF129,re23ADFA,style-23423,date-2008-02-01&page=2
* ?IAmSpyingOnYou=a768sdf129&YouAreASucker=re23adfd

Testing Your Implementation

So you've implemented your new favorite method, it compiles on your dev box, and now it's time to roll it into production, right? Maybe not! The initial goal of referrer URL-based tracking was to understand where your traffic was coming from so you can use that information to optimize your business. To ensure the data your collecting is actually useful, we highly recommend that you do some testing to ensure that all the common scenarios are working the way you expect, and you know where the holes are in your measurement capabilities. As with all metrics on the web, there will be holes in your data so you need to know what they are and account for them.

The first step in testing the implementation is to try it with a test parameter, walking the full scenario through start to finish.

1. Create several phoney promotional links that reflect the actual types of links you expect. This could be on your home page, product pages or with many additional query parameters that you might encounter.
2. Place these fake promotional links in a location that won't confuse your customers but are likely to get indexed by search engines. Using a social networking site or a blog might serve this well.
3. Click through those links as a customer and verify that you get to the correct page with a good user experience. Be sure to take these into account as well:
* Redirects operating properly (if you're using them) - use the Live HTTP Headers tool in FireFox to ensure the application is providing the correct headers (301 redirect and caching).
* Major browsers all work- if you're using cookies, you should test all the major browsers to ensure that they support cookies and that your scenario works the way you might expect. Don't forget to try common mobile browsers if your customers access your site this way.
4. Check out the search engine experience to ensure that you're not running into the duplicate content or ranking issues.
* Major Engines submit URL - if you place the test URLs in the right social network or place on your blog, they should get indexed within a week or so. If they don't you can also try the "submit a URL" from Google, Yahoo and Microsoft, though they are not guaranteed to work. Essentially you want to make sure the search engines have had the opportunity to see these URLs.
* Use 'site:' command to ensure tracking URLs are not indexed - here's an example query in Google, Yahoo, and Microsoft showing that our Jane and Robot example promotional URLs are not indexed.
5. Take a look at your metrics and ensure the numbers you're recording correlate to the testing you are doing. Some additional things to consider:
* Internal referrals - you might also want to add some logic to your application to filter out (or exclude) all referrals from the development team and your own employees. This is often done by checking requests against a list of known employee or company IP addresses and scrubbing those from your tracking data.
* Caching Issues - you might also want to try out several scenarios with multiple subsequent requests. You'll want to ensure that every request is going to your server and not getting cached somewhere along the way.

0 comments: