The Definitive List (75+) of Link Building Techniques in 2008

Tuesday, December 16, 2008 at 8:19 PM Posted by Vasu

Below, I’ve compiled a list of techniques I personally know of being utilized in the industry. It is by no means comprehensive, but is one of the most complete lists I’m aware of. I’ve tried to keep each technique as mutually exclusive as possible.
I would like this list to become definitive, so I propose this;
if you can provide 5 techniques not already covered (at the sole discretion of SEP), we will:
a. incorporate your ideas, and give you proper attribution for all to see
b. send you a $50 gift certificate
and … the person who submits the most new link building techniques by February 1st 2009, that ultimately get incorporated into this document (again, at the sole discretion of SEP), will be awarded $250.
Some of the techniques mentioned in this document are not condoned by SEP or the search engines, but are listed only to add to the comprehensiveness of the document. Only you know your own risk tolerance, and so must make the decision regarding tactics yourself.
Each of the techniques below, assumes a basic knowledge of the use of NoFollow tag, robots.txt exclusions, and the identification of bad neighbourhoods.
The Definitive List of Link Building Techniques:
1. build useful tools, that others will talk about and link to. These may be WordPress plugins (eg. a 301 Redirect Plugin for Wordpress), Firefox extensions (eg. Dave Naylor’s multi-browser SEO Tool, and 97th Floor’s Social Media Plugin), and the like
2. build fun or helpful widgets relevant to your business or industry (ie. mortgage calculators, body mass index calculators, how much is my blog worth, etc.).
3. offer badges
4. offer awards
5. issue online press releases
6. build definitive lists or resources via your blog eg. 75+ Link Building Techniques of 2008
7. write and release an ebook eg. How to SEO Your Site in 60 Minutes by Matt McGee, The SEO Handbook - 2008 by Dave Harry.
8. reciprocal linking with suppliers, clients, and complementary businesses
9. hold a contest. Sometimes even the smallest of contests can result in large numbers of links
10. guest blog on relevant sites in your industry

Image courtesy SEOResearcher.com
11. submit to quality directories
12. submit your blog postings to search engine friendly social media
13. sponsor an event, or a cause
14. provide testimonials for industry products you use
15. vanity bait others - in your blog, include positive mentions of the blog posts of others, and link freely to their work. Often, they’ll feel compelled to acknowledge you
16. comment on the blogs of others in your industry, or those of related sites.
17. send samples of your product to industry bloggers and ask them to try the products. If they like them, they may write about them. Of course the risk is, if they don’t like them, they may write about them too.
18. participate in surveys or group projects.
19. write “how to” guides or posts
20. release unique research results
21. be the first to uncover a significant news story
22. post job postings in colleges and universities
23. offer to speak at a college or university
24. help write curriculum for a college or university
25. provide a bursury or endowment to the school
26. make a financial contribution to your college or university. Often this results in a dofollow link acknowledging the contribution.
27. study your local government site to look for opportunities to get listed (ie. news, directory, etc.)
28. use ppc to drive qualified searchers to your quality content, knowing a % will link.
29. syndicate articles
30. trade articles with other webmasters
31. search for sites with dofollow ‘trackbacks’ and link to some of their blog posts.
32. carefully select your friends on social media (eg. Digg, Stumbleupon, Delicious). If they work for media (eg. newspapers, magazines, radio stations, etc.) cater the content to their apparent interests. I’ve had content mentioned on radio shows before (and linked to from the radio’s website), merely by knowing to put a ‘conspiracy’ spin on the science content.
33. internal links are crucial and important. Ensure you use proper navigation structure
34. submit to nofollow social media too, if the content is really good. If it goes hot, it will be seen by large numbers of people potentially, and will typically result in a number of good links.
35. study competitors banklinks via Yahoo Site Explorer, and request backlinks from many of the same sources
36. maintain profiles on RELEVANT social media sites with DoFollow profile pages, and heavily link those pages with other pages from the social media site (ie. lots of friends, lots of activity).
37. buy another domain with links pointing to it, and 301 redirect it
38. always carry a camera, capture interesting photos, and post them to the site and Flickr using Creative Commons licensing (requiring links back to your site).
39. include links in the footers of your RSS, so content scaped from your site contains links back to you.
40. rent some links from companies like Text Link Brokers. Be careful though. Google will penalize both the selling and buying party for paid links.
41. strike a deal to purchase a text link ad on relevant pages of a related site. Best if the link is from within a site’s actual content.
42. offer to make a contribution to charities
43. as Aaron Wall and Andy Hagans have suggested from their post 101 Link Building Tips to Market Your Website (points 62 and 63), sue a large company (get the David vs Goliath element working in your favour) or get sued by a company that people hate. Genius!
44. interview industry personalities, and post video and text versions to your blog. Others are always interested in what they have to say, and often will find reason to link to something they’ve said.
45. speak at industry events, especially if you know they offer dofollow profiles and linkbacks for such speakers.
46. apply for awards. Even if these sites do not link back to your site, news of winning awards can often be good linkable content for local media.
47. join a number of local organizations such as the local Board of Trade. Often, these sites provide dofollow member directory links.
48. offer a discount or free product samples to others who blog about your product.
49. submit your blog to RSS feed directories
50. do something remarkable such as running a marathon or triathlon
51. do something controversial, or to infuriate the industry (eg. say SEO is dying to a group of SEOs … good one Jason Calcanis, ShoeMoney). Most will link to you, not knowing they’re helping to further promote the issue
52. create niche sites for specific segments of topics of your industry, and link back to your site
53. create an industry group or panel, complete with its own website. Often competitors will not link directly to you, but they will link to sites that link to you!
54. find pages that link to you that are not in the indexes of search engines, and are not noindexed or blocked by robots.txt, and link to those pages so spiders will follow them and find the links pointing back to your site.
55. buy sponsored reviews
56. there are a number of wiki types sites on the internet, that will provide link backs if you add good quality content to their sites.
57. start a search engine friendly affiliate program
58. provide answers via some ‘Answers’ type sites, and link to your content on the matter as proof of your answer
59. provide a link (from a specific content piece) to a particular site that you wish to receive a link from, and send them large volumes of traffic. Often, this will be enough to capture their attention, and link back to your post on the subject.
60. attend related conferences and network, network, network. People are camera crazy these days, and will post pictures of most of the people the meet, often complete with links back to your site.
61. develop relationships with local media personalities, such that you are their ‘Goto’ person when they need information on a given subject.
62. some sites have a “See Us In The Press” page. When you find such sites relating to your industry, write something about them. Often, they’ll pick it up and post it to their site … everyone like to show off their cudos.
63. forum commenting
64. search keyword “add url”, keyword “add site”, and add your site to those sites
65. be a ‘top commentator’ on search engine friendly blogs offering the plugin
66. submit videos to tv channel websites (thank you for this idea Melanie Nathan) and be sure to include relevant text links.
67. point out errors (spelling, grammatical, coding) on sites where you’d like a link from. Often, thanks is provided in the form of a link.
68. develop a WordPress plugin that automatically generates links back to the author’s site from the blog where the plugin is ‘plugged in’ (this is extremely black hat!!!)
69. list yourself with the BBB (Better Business Bureau) and other local organizations. It should be noted however, that not all BBB sites offer DoFollow links
70. submit the widgets you create to widget directories
71. create customized WordPress themes, complete with properly worded attribution links
72. search for articles, sites, or posts that mention your personal or business name, though do not link to it, and then contact them and ask them to link to your site. This is courtesy of Wiep
73. if your company owns numerous web sites, have the other websites interlink. Again courtesy of Wiep
74. find companies offering the same products or services, except in different countries, regions, or even languages, and approach them to exchange links (Wiep … you’re on a roll)
75. submit your RSS feed to RSS feed directories
76. create an industry niche directory, and link back to your site

How to make sure your affiliate program passes PageRank & SEO benefits

Sunday, December 14, 2008 at 7:49 PM Posted by Vasu

Search engines are not quite decided on whether they class affiliate links as paid links or not. If you take the time to set up an affiliate program why not use it to generate thousands of high value links to your product pages?
This post will tell you everything you need to know about maximising the SEO value of your affiliate links.

Easy: Don’t go through a 3rd party

Search engines won’t count your affiliate links if they go via a third party affiliate network. Either go with a network that allows you to use your own links or run the program in house.

Easy: Allow deep links

Most people do this already but it’s important to make sure your affiliates are linking to your product pages not just the homepage.

Harder: Consolidate your links

Most affiliate programs have links like
http://www.site.com/category/product.html?aff=123
This causes duplicate content problems - the way to fix the issue is to set an affiliate cookie and then redirect to the normal product page http://www.site.com/category/product.html

Really clever: Don’t make it look like an affiliate program

Any URL with the parameter aff=123 clearly looks like an affiliate link. Amazon is smart and uses tag= as their parameter. Why not try some of the following as affiliate links?
http://www.site.com/page/123/
http://www.site.com/product-name/page123/
http://www.site.com/blogpost/123/product-name.html
Confuse Google by using a non-standard nomenclature for your parameters.

Really clever: Intelligent use of cookies

Do you name your affiliate cookies affid? Just because Google doesn’t accept cookies doesn’t mean it doesn’t see what cookies are being sent in the header information.
When Google sees an affid cookie being set followed by a 301 redirect to strip out parameters it’s a fair assumption the link is an affiliate link.
Try calling your cookie something random like “visitor” or even cloaking the cookie so that it isn’t sent to search engines.

Super clever: Don’t use URL parameters

A few sites have started tracking based on referrer headers, this gives a clean link and search engines have no way of knowing the links are affiliate links, unless you are stupidly telling everybody about it.
My favourite trick is to use links in the following format:
http://www.site.com/#john
http://www.site.com/product-name.html#steve
Search engines view urls with different # tags as the same page so you can have as many of these as you like without coming across duplicate content issues. The way to handle tracking is to use JavaScript to parse the # tag and use it to populate a hidden form field which is posted to your shopping cart when the “add to cart” button is pressed.

Googles Page Update Life Cycle

Tuesday, December 9, 2008 at 8:28 PM Posted by Vasu

People know (or at least they should do) that implementing a number of SEO techniques and methods on any given page can influence the search rankings in a positive way. There are plenty of resources to help explain how you can create the “perfect page” in regards to SEO but are there any clear metrics for success? What can you expect if you change or alter page content, or perhaps the Meta data?
One thing we do know is that if a good SEO gets their hands on your website or specific page you will see positive results. What I’ve been doing is benchmarking when the changes take place in Google and whether the changes are positive and negative.
The Google update test
I optimised around 50 pages of a website I own that I initially setup around 3 months ago, the Meta data, page tags and content was not optimised at all. I created a strategy to optimise these pages, the actual content of these products were products i.e. one product per page. I changed the following:
1.    Optimised the meta data
2.    Included keywords and alternative keyword phrases on page
3.    Optimised the images on the page

I had read somewhere that updating large numbers of pages on a website all at once could lead to a possible penalty, although I have never seen this I thought this test would help determine this theory.
I benchmarked data over a six week period on Google, based on individual pages and their targeted keywords, which had been optimised.
Week 1
Around 80% of the pages actually increased rankings in the first week with around 15% remaining the same and only 5% dropping rank
Week 2
In the second week there were some more keyword increases and very few positions dropped - a good week all round.
Week 3
In the third week it was the complete opposite, just over 85% of the keywords dropped below their original ranking with 5% remaining the same and 10% increasing
Week 4
Huge increase of positions, now around 70% of the pages I originally optimised are ranking well above their previous position with many on page 1 or 2. Very few position drops from original positions but there were some.
Week 5
Not much movement between keyword positions but 30% of keywords have improved from week 4, 80% remain the same with around 10% dropping slightly.
Week 6
Final week and only one page has increased from week 5 while 2 pages dropped slightly, the rest remained the same.
Time for some graphs:
The first graph is has been taken as an average from over 40 optimised pages over the period of 6 weeks so visually you can see the update life cycle.
Google Update lifecycle
The next graph shows five randomly selected keyword behaviours over the 6 week period
5 keywords positions
The final graph shows another 10 randomly selected keywords and their position changes
10 keywords
Google’s Page Update Life Cycle
Yep, think that’s what I’m going to call it! Anyway I’m aware that this lifecycle of position changes probably goes on for a bit longer but the data over the 6 week period was the most active. This is something that I’ve seen many times before but have never benchmarked for such a test. It’s also worth mentioning that you do see position changes before Google has re-indexed the optimised page.
So if you go about updating pages of your site don’t worry if they go all over the place for the first month or so, if they have been optimised correctly then you should see some kind of improvement.

Advanced Website Diagnostics with Google Webmaster Tools

at 8:26 PM Posted by Vasu

Running a website can be complicated—so we've provided Google Webmaster Tools to help webmasters to recognize potential issues before they become real problems. Some of the issues that you can spot there are relatively small (such as having duplicate titles and descriptions), other issues can be bigger (such as your website not being reachable). While Google Webmaster Tools can't tell you exactly what you need to change, it can help you to recognize that there could be a problem that needs to be addressed.

Let's take a look at a few examples that we ran across in the Google Webmaster Help Groups:

Is your server treating Googlebot like a normal visitor?

While Googlebot tries to act like a normal user, some servers may get confused and react in strange ways. For example, although your server may work flawlessly most of the time, some servers running IIS may react with a server error (or some other action that is tied to a server error occurring) when visited by a user with Googlebot's user-agent. In the Webmaster Help Group, we've seen IIS servers return result code 500 (Server error) and result code 404 (File not found) in the "Web crawl" diagnostics section, as well as result code 302 when submitting Sitemap files. If your server is redirecting to an error page, you should make sure that we can crawl the error page and that it returns the proper result code. Once you've done that, we'll be able to show you these errors in Webmaster Tools as well. For more information about this issue and possible resolutions, please see http://todotnet.com/archive/0001/01/01/7472.aspx and http://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx.

If your website is hosted on a Microsoft IIS server, also keep in mind that URLs are case-sensitive by definition (and that's how we treat them). This includes URLs in the robots.txt file, which is something that you should be careful with if your server is using URLs in a non-case-sensitive way. For example, "disallow: /paris" will block /paris but not /Paris.

Does your website have systematically broken links somewhere?

Modern content management systems (CMS) can make it easy to create issues that affect a large number of pages. Sometimes these issues are straightforward and visible when you view the pages; sometimes they're a bit harder to spot on your own. If an issue like this creates a large number of broken links, they will generally show up in the "Web crawl" diagnostics section in your Webmaster Tools account (provided those broken URLs return a proper 404 result code). In one recent case, a site had a small encoding issue in its RSS feed, resulting in over 60,000 bad URLs being found and listed in their Webmaster Tools account. As you can imagine, we would have preferred to spend time crawling content instead of these 404 errors :).

Is your website redirecting some users elsewhere?

For some websites, it can make sense to concentrate on a group of users in a certain geographic location. One method of doing that can be to redirect users located elsewhere to a different page. However, keep in mind that Googlebot might not be crawling from within your target area, so it might be redirected as well. This could mean that Googlebot will not be able to access your home page. If that happens, it's likely that Webmaster Tools will run into problems when it tries to confirm the verification code on your site, resulting in your site becoming unverified. This is not the only reason for a site becoming unverified, but if you notice this on a regular basis, it would be a good idea to investigate. On this subject, always make sure that Googlebot is treated the same way as other users from that location, otherwise that might be seen as cloaking.

Is your server unreachable when we try to crawl?

It can happen to the best of sites—servers can go down and firewalls can be overly protective. If that happens when Googlebot tries to access your site, we won't be able crawl the website and you might not even know that we tried. Luckily, we keep track of these issues and you can spot "Network unreachable" and "robots.txt unreachable" errors in your Webmaster Tools account when we can't reach your site.

Has your website been hacked?

Hackers sometimes add strange, off-topic hidden content and links to questionable pages. If it's hidden, you might not even notice it right away; but nonetheless, it can be a big problem. While the Message Center may be able to give you a warning about some kinds of hidden text, it's best if you also keep an eye out yourself. Google Webmaster Tools can show you keywords from your pages in the "What Googlebot sees" section, so you can often spot a hack there. If you see totally irrelevant keywords, it would be a good idea to investigate what's going on. You might also try setting up Google Alerts or doing queries such as [site:example.com spammy words], where "spammy words" might be words like porn, viagra, tramadol, sex or other words that your site wouldn't normally show. If you find that your site actually was hacked, I'd recommend going through our blog post about things to do after being hacked.

There are a lot of issues that can be recognized with Webmaster Tools; these are just some of the more common ones that we've seen lately. Because it can be really difficult to recognize some of these problems, it's a great idea to check your Webmaster Tools account to make sure that you catch any issues before they become real problems. If you spot something that you absolutely can't pin down, why not post in the discussion group and ask the experts there for help?

Domain Canonicalization

Monday, December 8, 2008 at 9:08 PM Posted by Vasu

Pop quiz: what's the difference between the following URLs:

* http://website.com
* http://www.website.com
* http://website.com/default.php
* http://www.website.com/default.php

Give up? If you're a user, then chances you expect all of those URLs will lead you to the same page. Robots, however, are not as good at determining if pages are the same, so they often store each separately. A big part of how search engines rank pages is based on how many external links those pages have. If other sites on the web link to the different versions of your home page, then search engines may calculate the value of each URL separately, based on the number of links to each version. This can effectively diminish the potential rank your page would have if it were found (and linked to) by only one URL.

The practice of consolidating all versions of a page under one URL is referred to as "canonicalization" (because you collapse all versions under the "canonical" or true version). The four examples listed above are the most common, but there are potentially many, many URLs that lead you to the same page. By adhering to several best practices, you should be able to address 90% of common site-wide canonicalization issues on your site and consequently increase how your site ranks.
Recommendation

The solution is to be explicit about the canonical form of your URLs. Following are four best practices to achieve this, with specific code and configuration examples.

1.

Select WWW or Non-WWW, then redirect the other option to your preferred version.
The hard part is choosing if you want your site to be "www.website.com" or simply "website.com". There is no right answer for every company so you'll have to figure this out on your own (but, removing the "www." saves your customers 4 keystrokes, which really add up on a mobile device, and it makes your brand the first thing your customers see).

Once you've selected, you then need to find a way to trap all requests to your application, check which form is being used, and if it is not the correct form, initiate a 301 Redirect to the correct form. For example, if the user types in wikipedia.org, they will automatically get redirected to www.wikipedia.org.
2.

Remove the default filename from the end of your URLs.
All web servers allow you to select one or more default filenames to serve when the browser requests a directory. For example, this website is run on IIS, so when the user requests "http://janeandrobot.com" we really serve "http://janeandrobot.com/default.aspx".

In the same code you use to enforce www vs. non-www, you should also check and see if the default filename is at the end of the URL and then trim it off. So, "http://janeandrobot.com/default.aspx" would be converted to "http://janeandrobot.com".
3.

Link internally to the canonical form of your URL.
Make sure you always link to the proper canonical form of your URLs from within your site. This practice helps encourage external sites to link to the site using the correct version as well (since those linking to you often cut and paste from your pages or RSS feed.) Note there is a degree of diminishing returns here, so you don't need to spend the whole weekend hunting down every last URL. Just make sure to review your site's primary navigation, top landing pages and blog.
4.

Use Google Webmaster Tools to tell Google the correct form.
Implementing these best practices on your site are ideal, since they address the problem for all search engines and give your customers a consistent, properly branded navigation experience. But what can you do if you reviewed steps 1-3 and found that it would take six months to implement on your production site? There is something that you can do today: using Google's Webmaster Tools, you can navigate to the "Tools" section and select "Set preferred domain." Here you can specify if you'd like Google to use "www.website.com" or "website.com" in their index and search results, as well as consolidate links to both versions. Note that while this will provide you short-term benefit from Google, it does not help you in Yahoo! or Live Search.

Checking Your Website

To check your website to see if you're handling domain canonicalization correctly, you can use the Live HTTP Headers add-on for Firefox.
Open the Live HTTP Headers tool, then try all the variations of the URL at several different levels to ensure they all redirect back to the appropriate canonical form. As you're checking each variation, look at the HTTP headers using the Firefox plug-in to ensure they are all 301 redirects (and not, for instance, 302 redirects).

Here's an example test case:
Canonical URL Form Test Case Test Result
http://janeandrobot.com janeandrobot.com Success
janeandrobot.com/default.aspx Success
www.janeandrobot.com Success
www.janeandrobot.com/default.aspx Success
http://janeandrobot.com/about.aspx janeandrobot.com/about.aspx Success
www.janeandrobot.com/about.aspx Success
http://janeandrobot.com/folder janeandrobot.com/folder Success
janeandrobot.com/folder/default.aspx Success
www.janeandrobot.com/folder Success
www.janeandrobot.com/folder/default.aspx Success
http://janeandrobot.com/folder/test.aspx janeandrobot.com/folder/test.aspx Success
www.janeandrobot.com/folder/test.aspx Success
Examples

Canonicalization issues are very common and being an Microsoft employee, I don't have to go far to find an example. Check out the website for Microsoft's annual Mix conference for web developers.

I was able to generate the table below by plugging the common URL variations into Yahoo's Site Explorer to find a list of links to each variation.
URL Variation Number of Links from within website Number of Links from outside websites
http://visitmix.com 17,663 59,498
http://www.visitmix.com 9,074 22,179
http://visitmix.com/default.aspx 0 22
http://www.visitmix.com/default.aspx 0 12

Looking through these numbers yields some interesting insights:

*

Not doing "www" vs "non-www" is definitely hurting their ranking - you can tell because they have a similar number of inlinks for each version. Ranking is done on a logarithmic scale, so every additional link is more valuable than the one before. If they redirected all versions to one canonical form, search engines would see their home page has having 81,711 external links, would would be a substantial boost.
*

They are not good about using the same version of the URL within their site. If you're not cognizant of this on your site, others won't be either. It looks like they use visitmix.com about 75% of the time internally, and www.visitmix.com the other 25%.

URL Referrer Tracking

at 9:07 PM Posted by Vasu

There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this way can result in significant issues with search engines. In particular, it can cause duplicate content issues (since the search engine bot finds multiple valid URLs that point to the same page) and ranking issues (since all the links to the page aren't to the same URL).

Let's say that Jane and Robot uploaded two different online training seminars to YouTube as part of a viral marketing effort to drive more traffic to our site. To gauge our return on investment from each of these seminars, we've added a tracking parameter to the link within each YouTube description that a customer can click on to learn more, here are the two URLS: http://janeandrobot.com/?from=promo-seminar-1 and http://janeandrobot.com/?from=promo-seminar-2. Each would bring the customer to our home page (the same page served by http://janeandrobot.com) and we would track the conversions based on the from parameter in the URL.

While this solution may seem to work well initially, it can result in low quality tracking data and impact our search acquisition. Here's a summary of the major problems:

1.

Duplicate content - search engines sometimes have difficulty determining if two URLs contain the exact same page (see canonicalization for more information). In this case, we're creating this problem because we've created multiple URLs for the same page. Search engines are likely to find all three URLs for the home page and store/ rank them as separate content within their index. This could cause the search engine robots to crawl the page three times instead of just once (which may not be a big deal if we are only tracking two promotions, but could become a big problem if we used similar tracking parameters for many other campaigns and URLs). Not only are the robots using more bandwidth than is necessary, but since they don't crawl a site infinitely, they could spend all the allotted time crawling duplicate pages and never get to some of the good unique pages on the site.
2.

Ranking - search engines use the number of quality links pointing to a URL as a major signal in determining the authority and usefulness of that content. Because we now have three different URLs pointing to the same page, people have three choices when linking to it. The result is a lower rank for all of the variations of the URL. Search engines generally filter out duplicates, so for instance, if the original (canonical) home page has 100 incoming links and each URL with a tracking parameter has 25 links, then search engines might filter out the two URLs with fewer links and show only the canonical URL, ranking it at position eight for a particular query based on those 100 incoming links. If all incoming links were to the same URL, then search engines would count 150 links to the home page and might rank it at position three for that same query.

Another danger is that if one of the YouTube promo videos becomes exceptionally popular, its promo URL might gain more links than the original home page URL. Using this same example, if one of the promo URLs gained 200 links, search engines might choose to display it in the search results over the original home page. This could cause a confusing experience for potential customers who are looking for your home page (http://janeandrobot.com/?from=promo-seminar-1 doesn't look like a home page and searchers might be less likely to click on it, thinking it's not the page they're looking for). It's also not ideal from a branding perspective.
3. Reporting quality - as social networking sites become more popular, we become more of a sharing culture online. Many people use bookmarks, and online bookmarking sites such as Delicious, email, and other sharing sites such as Facebook, Twitter, and FriendFeed to save and share URLs. They'll click on on a URL, and if they like it, copy and paste it from the browser's address bar. If the link they're saving/sharing happens to be one of our promotional links, then they have preserved this link for all time, and everyone who clicks through the link will look identical to someone coming through the promo. This skews the reporting numbers of who went to the site after viewing the video -- which was why we set up the tracking parameters in the first place!

Implementation Options

Unfortunately there is no perfect solution for this scenario, and what works best for you depends on your infrastructure and situation. Here we've listed several common solutions that you can choose from to improve your own implementation. We generally recommend the first solution (Redirects), but there are pros and cons to each option that you should review carefully before making your decision.
Redirects (and Cookies)

The first option strives to solve the problem by trapping all of the promotional requests, recording the tracking information, then removing the tracking parameter from the URL. This can be time consuming to implement, but it is the best all-round scenario to address the three major issues listed above.

If you wanted to get fancy, and track a user's entire session based on your referral parameter, then you can use this method as well and simply set a cookie on the client machine at the same time you trap the request. This is recommended to understand the value of traffic from different sources. In either case, here are the steps you'll need to undertake:

1. Trap the incoming request - find where you web site application's logic processes the HTTP request for your page. Trap each request at that point and check if it has a tracking parameter. If it does, record this in your internal referral tracking system. You can record this either in your server logs, or in a custom referral tracking database you maintain on your own.

* If you also would like to track the entire user's session, then you should also use this opportunity to set a cookie on the client.

2. Implement the redirect - next step is to implement a 301 redirect from the current URL to the same page without the tracking parameter (or the canonical URL). Don't for get to use the cache-control attribute in the HTTP header to ensure that all the requests come to your server and don't get handled automatically in some network-based cache. Here's what a sample redirect header might look like:

301 Moved Permanently

Cache-Control: max-age=0

Note that ASP.Net and IIS both use 302 redirects by default, so you many need to manually create the 301 response code.

The way this works is that when a search engine encounters a promotional URL (http://janeandrobot.com/?from=promo-seminar-1) it issues an HTTP GET request to the URL. The HTTP response tells the search engine that this page has been permanently moved (301 Redirect) and provides the new address (the same as the old address but without the tracking parameter). The search engine then discards the first URL (with the tracking code) and only stores the second URL (without the tracking code). And everything is right in the world.

This implementation is one of the best options, but it does have some limitations:

* One downside of this method is that it requires you to manage your own referral tracking system. Because it traps the referral parameters and removes them from the URL before the page actually loads, 3rd party referral tracking applications like Google Analytics, Omniture, WebTrends or Microsoft adCenter Analytics will not be able to track these referrals.

URL Fragment

A simple and elegant option is to simply place the tracking parameter behind a hash mark in the URL, creating a URL fragment. Traditionally, these are used to denote links within a page, and are ignored completely by search engines. In fact, they simply truncate the URL fragment from the URL.

Old URL

* http://janeandrobot.com/?from=promo-seminar-1
* http://janeandrobot.com/?from=promo-seminar-2

New URL with URL Fragment

* http://janeandrobot.com/#from=promo-seminar-1
* http://janeandrobot.com/#from=promo-seminar-1

By default Google Analytics will ignore the fragment as well, however there is a simple work around that was provided to us by Avinash Kaushik, Google's web metrics evangelist. Using the following JavaScript:

var pageTracker = _gat._getTracker("UA-12345-1");

// Solution for domain level only

pageTracker._trackPageview(document.location.pathname + "/" + document.location.hash);

// If you have a path included in the URL as well

pageTracker._trackPageview(document.location.pathname + document.location.search +

"/" + document.location.hash);

You can create a few additional variations of this if you also have additional queries in the URL you would like to track. Check with your web analytics provider to find out if you need to customize your implementation to account for using URL fragments for tracking.

Does this sound too simple and easy to be true? There are a couple downsides to this approach:

* This option fixes issues 1 (duplicate content) & 2 (ranking) listed above, but it will not address the 3rd issue of reporting. You could still encounter some reporting issues using this method if people are bookmarking or emailing around the URL.
* Typically you'll have to write some custom code to parse the URL fragment. Since it's a non-standard implementation, standard methods many not support this.

Robots Exclusion Protocol

Another relatively simple solution is to use robots.txt to ensure that search engines are not indexing URLs that contain tracking parameters. This method enables you to ensure that the original (canonical) version of the URL is always the one indexed and avoids the duplicate content issues involving indexing and bandwidth.

Assuming that all of our tracking parameters will follow a similar pattern to this:

http://janeandrobot.com/?from=

we can easily create a pattern that will match for this. Below is a robots.txt file that implements the pattern:

# Sample Robots.txt file, single query parameter

User-agent: *

Disallow: /?from=

The first line means that this rule should apply to all search engines (or robots crawling your site), and the second line tells them that they can't index any URLs that start with 'janeandrobot.com/?from=' and some type of promotional code of any length. See complete information on using the Robots Exclusion Protocol. Use this pattern if you will have multiple query parameters:

# Sample Robots.txt file, multiple query parameters

User-agent: *

Disallow: /*from=

Once you've implemented the pattern appropriate for your site, you can easily check to see if it is working correctly by using the Google Webmaster Tools robots.txt analysis tool. It enables you to test specific URLs against a test robots.txt file. Note that although this tool tests GoogleBot specifically, all the major search engines support the same pattern matching rules. In Google Webmaster Tools:

1. Add the site, then click Tools > Analyze robots.txt. (Unlike most features in Google Webmaster Tools, you don't need to verify ownership of the site to use the robots.txt analysis tool). The tool displays the current robots.txt file.
2. Modify this file with the Disallow line for the tracking parameter. (If the site doesn't yet have a robots.txt file, you'll need to copy in both the User-agent and Disallow lines.)
3. In the Test URLs box, add a couple of the URLs you want to block. Also add a few URLs you do want indexed (such as the original version of the URL that you're adding tracking parameters to).
4. Click Check. The tool displays how Googlebot would interpret the robots.txt file and if each URL you are testing would be blocked or allowed.

At this point you may be thinking, wow, I can do all this and not have to write any new code? Unfortunately, there are even more downsides to this approach than the others:

* This option will fix issue 1 (duplicate content), but not issues 2 (ranking) and 3 (reporting). This can be a good interim solution while you're implementing the more complete redirects solution, but it often isn't useful enough on its own.
* Likely this will take a little bit of extra testing to ensure you get the patterns correct in your robots.txt file and don't inadvertently block content you want indexed.

Yahoo Site Explorer

Yahoo provides an online tool designed to solve this scenario. However, the solution only helps with Yahoo search traffic. To use the Yahoo fix, simply go to http://siteexplorer.search.yahoo.com and create an account for your web site in the Yahoo Site Explorer tool. Once you've verified ownership of your web site, you can use their Dynamic URL Rewriting tool to indicate which parameters in your URLs Yahoo should ignore.

image

Simply specify the name of the parameter you use for referral tracking (in our example it is 'from'), and set the action 'Remove from URLs'. Yahoo will then remove that parameter from all of your URLs while processing them and give you a handy little report about how many URLs where impacted.

Again, this is another solution that seems too easy to be true, but again, there are some significant limitations with this approach:

* At the end of the day this is still a Yahoo-only solution. With approximately 20% market share, it is likely this will not meet all of your needs. However, if you do get some percentage of your traffic from Yahoo, there is no harm in doing this in the short term while you implement another method in the longer term.
* The other problem with this solution is that it doesn't solve issue #3 (reporting), so you are still susceptible to reporting errors due to folks bookmarking and emailing your URLs with tracking codes.

Common Pitfalls
Cloaking & Conditional Redirects

Some web sites and SEO consultants attempt to solve this by a technique called cloaking or conditional redirects. Essentially what these methods do is check if the HTTP GET request is coming from a search engine and then show them something different than normal users see. This something different could be a simple 301 redirect back to the page without the tracking parameter similar to our first solution above. The difference is that our solution implemented this redirect for all requesters, and cloaking/ conditional redirects implement it only for search engines.

The big problem with this implementation method is that cloaking and conditional redirects are explicitly prohibited in the webmaster guidelines for Google, Yahoo and Live Search. If you use this method, you risk your pages being penalized or banned by the search engines. The primary reason they prohibit this behaviors is because they want to know exactly what content they are presenting searchers using their service. When a web site shows something different to a search engine robot than to a general user, a search engine can never be sure what the user will see when they go to the web site. So, even if you're thinking of implementing cloaking for what seems to be a valid, and not deceptive, reason, it's still a technique search engines strongly discourage.

This leads to the second major problem with this implementation method - it adds significant complication and can be difficult to monitor whether or not it's working - e.g. you have to test it pretending to be each of the 3 search engines robots. When things go wrong, it is likely that you're not going to see it right away, and by the time you do, your search engine traffic may already be impacted. Check out this example when Nike ran into an issue with cloaking.
Crazy Tracking Codes

Many studies on the web that show customers prefer short, understandable URLs over long complicated ones, and are more likely to click on them in the search results. In addition, users prefer descriptive keywords in URLs. Therefore, it might be worth your time to spend a few extra minutes thinking about the tracking codes you use to see if you can make them friendlier.

Good examples

* ?from=promo
* ?from=developer-video
* ?partner=a768sdf129

Bad examples

* ?i=A768SDF129,re23ADFA,style-23423,date-2008-02-01&page=2
* ?IAmSpyingOnYou=a768sdf129&YouAreASucker=re23adfd

Testing Your Implementation

So you've implemented your new favorite method, it compiles on your dev box, and now it's time to roll it into production, right? Maybe not! The initial goal of referrer URL-based tracking was to understand where your traffic was coming from so you can use that information to optimize your business. To ensure the data your collecting is actually useful, we highly recommend that you do some testing to ensure that all the common scenarios are working the way you expect, and you know where the holes are in your measurement capabilities. As with all metrics on the web, there will be holes in your data so you need to know what they are and account for them.

The first step in testing the implementation is to try it with a test parameter, walking the full scenario through start to finish.

1. Create several phoney promotional links that reflect the actual types of links you expect. This could be on your home page, product pages or with many additional query parameters that you might encounter.
2. Place these fake promotional links in a location that won't confuse your customers but are likely to get indexed by search engines. Using a social networking site or a blog might serve this well.
3. Click through those links as a customer and verify that you get to the correct page with a good user experience. Be sure to take these into account as well:
* Redirects operating properly (if you're using them) - use the Live HTTP Headers tool in FireFox to ensure the application is providing the correct headers (301 redirect and caching).
* Major browsers all work- if you're using cookies, you should test all the major browsers to ensure that they support cookies and that your scenario works the way you might expect. Don't forget to try common mobile browsers if your customers access your site this way.
4. Check out the search engine experience to ensure that you're not running into the duplicate content or ranking issues.
* Major Engines submit URL - if you place the test URLs in the right social network or place on your blog, they should get indexed within a week or so. If they don't you can also try the "submit a URL" from Google, Yahoo and Microsoft, though they are not guaranteed to work. Essentially you want to make sure the search engines have had the opportunity to see these URLs.
* Use 'site:' command to ensure tracking URLs are not indexed - here's an example query in Google, Yahoo, and Microsoft showing that our Jane and Robot example promotional URLs are not indexed.
5. Take a look at your metrics and ensure the numbers you're recording correlate to the testing you are doing. Some additional things to consider:
* Internal referrals - you might also want to add some logic to your application to filter out (or exclude) all referrals from the development team and your own employees. This is often done by checking requests against a list of known employee or company IP addresses and scrubbing those from your tracking data.
* Caching Issues - you might also want to try out several scenarios with multiple subsequent requests. You'll want to ensure that every request is going to your server and not getting cached somewhere along the way.

Best Robots.txt Tools: Generators and Analyzers

at 9:06 PM Posted by Vasu

While I do not encourage anyone to rely too much on Robots.txt tools (you should either make your best to understand the syntax yourself or turn to an experienced consultant to avoid any issues), the Robots.txt generators and checkers I am listing below will hopefully be of additional help:
Robots.txt generators:

Common procedure:

1. choose default / global commands (e.g. allow/disallow all robots);
2. choose files or directories blocked for all robots;
3. choose user-agent specific commands:
1. choose action;
2. choose a specific robot to be blocked.

As a general rule of thumb, I don’t recommend using Robots.txt generators for the simple reason: don’t create any advanced (i.e. non default) Robots.txt file until you are 100% sure you understand what you are blocking with it. But still I am listing two most trustworthy generators to check:

* Google Webmaster tools: Robots.txt generator allows to create simple Robots.txt files. What I like most about this tool is that it automatically adds all global commands to each specific user agent commands (helping thus to avoid one of the most common mistakes):

Google Robots.txt generator

* SEObook Robots.txt generator unfortunately misses the above feature but it is really easy (and fun) to use:

SEObook Robots.txt generator
Robots.txt checkers:

* Google Webmaster tools: Robots.txt analyzer “translates” what your Robots.txt dictates to the Googlebot:

Google Robots.txt analyzer

* Robots.txt Syntax Checker finds some common errors within your file by checking for whitespace separated lists, not widely supported standards, wildcard usage, etc.
* A Validator for Robots.txt Files also checks for syntax errors and confirms correct directory paths.

Leveraging Webmaster tools for SEO Success

at 9:06 PM Posted by Vasu

Google webmaster tools panel has been around for quite some time. The Live webmaster tools joined a bit later. I get a lot of questions from colleagues and clients whether these are really helpful, and lately I found myself spending more and more time within these places finding useful information and amending my SEO strategy accordingly.
Webmaster tools 101

So why do we need it and what’s in it for me? Well, in general, the webmaster tools panel is the only legitimate and official feedback platform which the engines supply us. That on its own, is more than enough for me to say “hey, let’s take a look at that these guys have to say for themselves”.

the very basic features are:

* One place to submit my standard XML site maps directly to the engines
* Feedback on crawling errors and problems
* Feedback on duplicate content issues (Google only)
* Feedback and settings for regional and domain name issues
* Control over “sitelinks” feature (Google only)
Leveraging Webmaster tools for SEO Success

While these tool are mostly ‘nice to have’, most are not very applicable. in the following clause I will give a few pointers as to how webmaster tools research became my primary place to look for root cause of indexing and ranking issues and how they can be solved using a few examples.

* Feed the crawlers with your content
Create XML site maps, use proper formatting and put only relevant content. make sure your sitemaps update automatically so Google doesn’t have to crawl the same content again and again.

* Find & Fix Broken links: lists of broken links from the webmaster tools panel will help us fix these broken links which can easily disturb our users. Moreover, they cause the crawlers extra work which ends up with bad results, resulting in reduced indexing and credibility for the entire web site.

* Find Server Problems
the “crawl stats” panel shows last 90 of google’s crawling history on your site: How many pages were downloaded every day, how much data was downloaded and the bandwidth it took. optimizing server performance and code standards can improve these parameters significantly.

* Fix Duplicate Content issues
Google webmaster tools reports duplicate descriptions and titles under the diagnostics > content area. Google shows samples of duplications they found including the actual duplicate links. All you need to do is go back to the programmers and fix the problems and you have more spider food.

* Improve crawling Efficiency
In the past, it was a well known habit to bombard Google with as much content and pages possible. This was, to say the least a bad concept. While we do want Google to take as much content from our website, havin them index our “terms of use” over and over again is really a waste of resources, and based on the assumption that every website received a certain data and bandwidth quote from the crawlers, we want them to maximize the effectiveness of their visit.
Therefore the proper use of robotx.txt directive and nofollow links (internal too) may help to guide Google to our core content and the important pages of the web site.

* Remove unwanted content
using the manual remove you can remove certain pages and even complete websites from the google index fast and at ease. The requirement is that the pages either have a block directive in the robots.txt file, noindex meta tag or to return a 404 - then you can simply enter pages from yur site you want removed an they will be removed almost immediately.
This is good also to improve the crawling and ranking efficiency.

* Find bad outgoing links
A new feature from the Live webmaster tools panel enables reporting of pages suspected as malware as well as outgoing links pointing at malware pages. This is really cool and important and may solve a lot of problems.

* Pump up your link development strategy
Both engines show you exactly which pages are linking to your pages from within your domain and from outside your domain. Analyzing that list can help a lot to understand how engines see your incoming links, where you need to improve and how to do it.

* Improve your regional rankings
Google webmaster tools and Live webmaster tools both offer the option to set your geo-target which may contradict your actual server or IP address location. Moreover, Live webmaster tools offers a very coo feature which presents the nciming links to the domain with their regional setting therefore you can easily see where most your links come from and optimize your link strategy accordingly.

Do Search Engines Use Bounce Rate As A Ranking Factor?

at 8:11 PM Posted by Vasu

Your web site’s bounce rate may be a significant factor in your search engine rankings. If the bounce rate on your site is high, you could end up with lower rankings in the search engines. Correspondingly, lower bounce rates may actually offer meaningful ranking boosts. (Don’t know what a bounce rate is? Hang on—definitions below.)

Don’t believe that bounce rate is a serious ranking factor? You should. A new study by SEO Black Hat shows some significant impact in a web site’s rankings as a result of significant changes in bounce rate. It is, of course, possible that the data in the study have been affected by other factors taking place at the same time, so this one study is not be any means conclusive.

I speculated on bounce rate as a ranking factor back in August of 2007. The underlying fact is that search engines want quality sites in their search results. High bounce rates may be a very good indicator of a poor site experience, or worse still, a complete mismatch between the content of the site, and the search query entered by the user. This provides heavy motivation to look at bounce rates as a meaningful SEO factor, and how to minimize bounce rates in the hope of increasing your rankings.

Bounce rate defined

What’s bounce rate? Google Analytics defines a bounce as any visit where the visitor views only one page on the site, and then does something else. What happens next? There are several possibilities, including the user clicking on a link to a page on a different web site, closing an open window or tab, typing a new URL, clicking the “back” button to leave the site, or perhaps the user doesn’t do anything and a session timeout occurs. This is still a bit fuzzy because of the nature of how “sessions” are defined in analytics packages. Analytics software that relies on Javascript tags only know when someone loads a page of your web site (so the Javascript runs).

As a result, these analytics packages have difficulty determining what happens in the meantime. In our first scenario, user A comes to your site, views one page, and then goes to a three-martini lunch (well, hopefully not). They then come back and visit 10 other pages on your site. Because of the way that analytics packages work, this will be seen as two different visits, and the first one will be recorded as a bounce (a single page visit).

For search engines there are other possibilities. For example, a bounce could be defined as user A entering a search query, going to your web site, returning to the search engine, and clicking on another result. Another possible definition involves user A entering a search query, going to your site, and returning to the search results page in less than “x” seconds. So there are a few possibilities of these kinds for the search engines to experiment with.

Of course, the major search engines have other data available as well. For example, they have toolbars which can be a rich treasure trove of data for tracking user actions. In addition, search engines license data from major ISPs and collect additional data to track where a user goes. The possibilities go well beyond what an analytics package can do.

There are some issues, of course. What happens if the user is looking for a single bit of information, such as Abe Lincoln’s birthplace. If a search result leads to a good reference site, the user gets the answer and are done. They may still click back to the search results and search on something else, or click on another result from the original search. Even though this is a satisfactory outcome, it’s recorded as a bounce by most analytics packages.

This type of scenario will be prevalent with users who are searching for a simple answer and get their responses from almanac-like information sites. The key here is to also try and factor comparative data in the way that bounce rate is used for influencing rankings. For example, the bounce rate of almanac sites may be higher than the bounce rate of an e-commerce site, which will likely be higher than the bounce rate of a directory site (where users have a high probability of going on to another site).

My guess is that search engines look (or will look) to determine how a site’s bounce rate compares to other sites that are comparable, or other sites that it in considering returning for a user’s search query. In this latter scenario, you could imagine a run-time adjustment where the search engine comes back with the “traditional” results to a user’s query, and then make a bounce rate adjustment.

The bottom line

There is a real possibility that bounce rate is a significant ranking factor right now. Even if it isn’t, it is my opinion that it will be made a factor in the near future. Even if it is not a factor, and even if it does not become one, there are plenty of reasons to look closely a bounce rate anyway.

It speaks to the conversion potential of your site, and this already gives you plenty of reason to look at it. You can read more about this aspect of bounce rate in this excellent post by analytics guru Avinash Kaushik, titled Standard Metrics Revisited: #3: Bounce Rate.

Yahoo! Search BOSS

at 8:10 PM Posted by Vasu

BOSS (Build your Own Search Service) is Yahoo!'s open search web services platform. The goal of BOSS is simple: to foster innovation in the search industry. Developers, start- ups, and large Internet companies can use BOSS to build and launch web-scale search products that utilize the entire Yahoo! Search index. BOSS gives you access to Yahoo!'s investments in crawling and indexing, ranking and relevancy algorithms, and powerful infrastructure. By combining your unique assets and ideas with our search technology assets, BOSS is a platform for the next generation of search innovation, serving hundreds of millions of users across the Web.
How Do I Get Started?

1. Check out BOSS specs and mash-up examples below
2. Review the documentation
3. Get a BOSS Application ID

Using the API or Web Service
Overview

Search APIs are nothing new, but typically they've included rate limits, strict terms of service regarding the re-ordering and presentation of results, and provided little or no opportunity for monetization. These constraints have limited the innovation and commercial viability of new search solutions.

BOSS (Build your Own Search Service) is different – it's a truly open API with as few rules and limitations as possible. With BOSS, developers and start-ups now have the technology and infrastructure to build next generation search solutions that can compete head-to-head with the principals in the search industry. BOSS will grow and evolve with a focus on providing additional functionality, tools, and data for developers.
Previously Available with Yahoo! Search API Available with BOSS
Queries Per Day 5,000 Unlimited*
No Restrictions on Presentation no yes
Re-Ordering Allowed no yes
Blending of Proprietary and Yahoo! Search Content Allowed no yes
Monetization no Coming Soon!
White-Label Attribution Required yes

* BOSS offers developers unlimited daily queries, though Yahoo! reserves the right to limit unintended usage, such as automated querying by bots.
Examples

hakia

hakia, a leading semantic search engine, uses Yahoo! Search BOSS to accelerate its semantic analysis of the Web by accessing the vast amounts of web documents in the Yahoo! Search index.

Me.dium Search

Me.dium combined the BOSS API with its insight into the real time surfing activity of the crowds to build a unique "Crowd-Powered" social search engine prototype.

Daylife

Daylife To-Go is a new self-service, hosted publishing platform from Daylife. Anyone can use this platform to automatically generate 100% customizable pages and widgets. Daylife To-Go uses the BOSS API platform to power its Web search module.

Cluuz

Cluuz generates easier to understand search results through patent pending semantic cluster graphs, image extraction, and tag clouds. The Cluuz analysis is performed in real-time on results returned from BOSS API.
Revenue Sharing

In the near future, we will launch a monetization platform enabling Yahoo! and partners to jointly participate in the economics of BOSS-powered search products. Either Yahoo! sponsored search integration, with certain implementation and exclusivity requirements, or potentially a payment model, will be required above a specified query threshold.
Terms of Use

Use of this service is subject to the BOSS API Terms of Use.
Learn More
BOSS Mashup Framework

The BOSS Mashup Framework is an experimental python library that provides developers with tools for mashing up the BOSS API with other third-party data sources.
BOSS Custom

BOSS Custom is an invite-only program focused on building highly scalable next generation search products. This program is designed for consumer web businesses with unique assets (such as extensive user data or novel technologies) that want to develop truly innovative search products using Yahoo!'s core search technology.

Pagination and Duplicate Content Issues

Sunday, December 7, 2008 at 9:22 PM Posted by Vasu

Very often with large dynamic sites webmasters just can’t do without pagination. Apart from usability here they come across another serious issue: duplicate content.
The thing is that all pages normally have identical titles and meta description and while some SEOs think it’s not an issue at all (as search engines “don’t see identical titles as duplicate content”), I do believe that this may at least result in inadequate site crawling rate and depth.

The problem may have a number of solutions none of which is unfortunately perfect:

Solution	Drawback
Add a different portion of the title/ description to the beginning. title A-G Blue Widgetse.g.	Not in each case this unique element can be sorted out with pagination (most often that will be just page # variable).
Put all the content in one html document. Then use JavaScript to create pagination without reloading the page.	Can be done only if there are not too many results to list; otherwise the page will be enormous and search engines won’t be able to crawl all of it.
Add NoIndex meta tag to the each page except the first one to keep search engines from indexing the pages but still allow them to crawl and follow the links. meta content="”NOINDEX," follow”="" name="”ROBOTS”"> /meta>	Naturally, none of the pages except the first one will ever be ranked and this still doesn’t solve the “PageRank leak” problem (i.e. to many links to follow and give weight).

Now, while each of these solutions has its pros and cons, I have a question to you: how do you handle these issues:

Do you try to avoid pagination at all?
Do you believe pagination doesn’t create duplicate content?
Can you think of any other solution not listed here?

7 Ways to Tame Duplicate Content

at 9:17 PM Posted by Vasu

Whether or not you believe that Google has an official duplicate content penalty, there's no question that duplicate content hurts. Letting your indexed pages run wild means losing control of your content, and that leaves search engines guessing about your site and its purpose.
Imagine that you have 25,000 pages in the Google index but only 1,000 pages of that site actually have unique content that you want visitors to see. By losing control of your duplicate content, you've essentially diluted all of those important pages by a factor of 25; for each important page, you have 24 other pages that the spiders have to sort through and prioritize. One way or another, your 25,000 pages are all competing against each other.
Of course, removing duplicate content, especially for large, dynamic sites, isn't easy, and figuring out where to focus your efforts can be frustrating at best. Having fought this battle more than once (including some penalty situations), I'd like to offer a few suggestions:
1. Rewrite Title Tags
They may not be glamorous, but HTML title tags are still a major cue for search engines, not to mention visitors, who see them on-page, in SERPs, and in bookmarks. Even in 2008, I too often see sites with one title, usually something like "Bob's Company" or, even worse, something like "Welcome." SEOs may argue about what makes a good page title, but I think most of us would agree on this: if a page is important enough to exist, it's important enough to have its own title.
2. Rewrite META Descriptions
Whether or not they impact rankings, META descriptions are a strong cue for search spiders, especially when it comes to duplication. Take the time to write decent descriptions, or find a way to generate descriptions if your site is dynamic (grab a database field and shorten it if you have to). If you absolutely can't create unique descriptions, consider dropping the META description field altogether. In some cases, it will be better to have the search engines auto-generate the description than duplicate one description across your entire site.
3. Rewrite Page Copy
This one may seem obvious, but if content isn't different, then it's not going to look different. Copy duplication often occurs when people include the same block of text on many pages or copy-and-paste to create content. If you're repeating text everywhere, consider whether it's really important enough to be repeated. If you copy-and-pasted your entire site, it's time to buckle down and write some content.
4. Lighten Your Code
Although search spiders have gotten a lot better about digesting large amounts of content, many sites still have trouble when unique content ends up pushed deep into the source code. This is especially a problem for older, non-CSS sites or for sites with a large amount of header content. Streamlining your code can be a big help; even if you can't make the jump to "pure" CSS, consider moving core elements into a style sheet. If you're repeating a lot of header content on every page, consider whether that could be reduced or if some of it could exist in one place (such as the home page). Often, large blocks of repeated content are as bad for visitors as they are for spiders.
5. Emphasize Unique Content
Consider using emphasis tags (

,

, , etc.) in your unique content, and use them sparingly in your page header. This will help spiders isolate page-specific content and tell pages apart more easily. Using headers and emphasis consistently will also help your visitors and make you a better copywriter, in my experience. 6. Control Duplicate URLs
This subject is a blog post or 10 all by itself, but I'll try to cover the basics. Dynamic sites frequently suffer from content duplicates created by multiple URLs pointing to the same page. For example, you may have 1 page with the following 3 URLs:

www.mysite.com/product/super-widget

www.mysite.com/product/12345

www.mysite.com/product.php?id=12345

Ideally, only one of those URLs would be visible and the others would operate as hidden redirects, but that isn't always feasible. If you can't use consistent URLs, pick the most descriptive format, and block the rest (nofollow or robots.txt). If one format is older and being phased out, make sure you redirect (301) the old versions properly until you can remove them.
7. Block Functional URLs
This is essentially a subset of #6, but concerns URLs that aren't really duplicates but have a functional purpose. For example, you may have something like:

www.mysite.com/product.php?id=12345

www.mysite.com/product.php?id=12345&search=super%20widget

www.mysite.com/product.php?id=12345&print=yes

These extended URLs are essentially functional directives, telling the page to take an action (like displaying a printable version) or passing along information (like a search string). Removing these completely gets pretty elaborate and isn't always possible, but these directives should definitely be blocked to search spiders. They are essentially hidden instructions that have no value to the spiders or SERPs.

Keyword Cannibalization and How to Handle It

at 9:09 PM Posted by Vasu

Keyword cannibalization (no matter how awfully terrifying it may sound) is a widely-spread website internal information structure problem that occurs when multiple subpages are (heavily) targeting one and the same key term.
Very often webmasters are unaware of any keyword cannibalization problems throughout the site - for example, (partial) internal duplicate content issues caused by the site CMS (e.g. pagination causes multiple pages with one and the same title) can account for it.
Others intentionally optimize several pages for one key term “to strengthen” the effect: they think the more a keyword is used throughout the site, the more important it seems for a search engine.
Keyword cannibalization is the issue not to be taken lightly because:

It causes inadequate index and crawl depth by forcing Google to choose between many pages and pick one it feels best fits the query “filtering” the rest of relevant pages;
It accounts for lower SEO effectiveness: efforts are not focused: you are spreading link power, keyword targeting, and anchor text on your site across multiple pages.
It causes internal site competition: your own pages compete with each other for a position in Google.

How to avoid or solve keyword cannibalization problems?

Get rid of internal duplicate content issues;
Organize your keyword lists to thoroughly think through the internal information structure:

Here is also a useful discussion on optimizing your site for multiple keywords.

Carefully think over your website architecture to make the most of internal anchor text and keyword prominence.

We often discuss the search network and content network distribution within Google AdWords. However, we have not explored AdWords’ third distribution channel: Google Search Partners. Today, we will discuss the these Google Search Partners are, how they effect your PPC campaign, and how can you monitor/manage them effectively.

at 9:05 PM Posted by Vasu

We often discuss the search network and content network distribution within Google AdWords. However, we have not explored AdWords’ third distribution channel: Google Search Partners. Today, we will discuss the these Google Search Partners are, how they effect your PPC campaign, and how can you monitor/manage them effectively.
First, let’s figure out what these Search Partners are. The Search Partners consist of sites that are within the Google Search Network. Your ads may appear alongside or above search results, as part of a results page as a user navigates through a site’s directory, or on other relevant search pages. Google’s global search network includes Google Product Search and Google Groups.
Quick note, keep in mind that your click-through rate on the search and content networks doesn’t affect your ad’s Quality Score on Google search.
You can choose whether or not you want your ads to appear on the Search partner network. Within your campaign settings tab, under the “Networks and bidding” section, you will find this information. As you can see here, you can opt-in to the Google search, Search Partner and Content networks:

If you choose to display your ads through the Search Partner network, you will be able to monitor you performance directly within the AdWords interface. At the campaign and ad group level you can now parse out your performance for each individual distribution channel. When you are running ads on all three networks, this is what your ad group interface can look like:

You can monitor your performance for all three channels here. I mention this because within AdWords report function you can not separate Google Search from Search Partners so this is the only place within your account where you can gain this visibility.
How do you effectively manage your performance within the Google Search Partners? First, you can opt in and out of this option at the campaign level, not at the ad group level. This can be irksome because within a given campaign, I have had ad groups that do well and some that don’t do well in the Search Partner network. When this occurs, you can separate your ad groups into individual campaigns. You can turn off the Search Partner network where it is not effective. And you can leave the Search Partner network on for those ad groups that do well. Basically, this is an all-or-nothing solution.
I did some research to learn how to best manage your CPC and ad position in the Search Partner network. Unfortunately, there is no direct way to adjust your bid on the solely Search Partner network without also adjusting your Google Search bid. So, if you alter a keyword bid this will effect both search networks. My suggestion is to manage your bids for the Google Search network and the Search Partner network will follow.
There is a great amount of opportunity with the Google Search Partner network. However, you have to monitor your performance closely, and you need to manage your bids in both Google search channels smartly.

Why User Experience Is A Crucial Part Of Good SEO

at 8:57 PM Posted by Vasu

Have you ever heard a search engine optimization (SEO) professional use the term user experience during a presentation, in an article or as part of a sales pitch? On the web, user experience (commonly abbreviated as UX or UE) is a term used to describe the overall perception, experience, and satisfaction that users have as a result of their interactions with a website.
Search engine optimization is all about the user experience, because the idea behind SEO is to get users to their desired information and destination(s) as quickly and easily as possible by using the users’ language (keywords). Searchers type in keywords at a commercial web search engine. Searchers’ expectations are validated in search results pages and, hopefully, after they click on links within those search results…a perfect, seamless user experience.
On the surface, an SEO professional’s presumed knowledge of user experience might sound impressive. However, if you do a little digging, you might discover that search professionals have their own preconceived notions as to what constitutes a positive user experience, notions that have little or nothing to do with users at all.
Users + experience = user experience
At the core of user experience is, you guessed it, users. I know this seems blatantly obvious and a little bit stupid for me to write. Nevertheless, user experience is a concept that seems to be lost on many search professionals. Here is why.
There are many different ways that both search professionals and usability professionals gather information about users. Focus groups, Web analytics data, keyword research, field interviews, and usability tests are all ways that these professionals can gather information about searcher behavior and interaction with a website. Search professionals rely heavily on keyword research tools and Web analytics data to determine how users, and search engines, interact with a website. However, as I outlined in a previous article, When Keyword Research and Search Data Deceives, keyword data can lead search marketers down the wrong path.
For example, a keyword phrase might be popular. And a site might rank well for this particular keyword phrase and its long-tail variations. But if the searchers who use this keyword phrase are not among your site’s target audience, then all of the time and expense put into the optimization and advertising for this keyword phrase is wasted. I often hear the legitimate-sounding excuse of the “positive brand experience” for appearing at the top of search results for various keyword phrases. How is appearing at the top of search results to the wrong target audience a positive brand (user) experience?
When I test search results pages for usability, I do not hear test participants say that they view these websites in a positive manner. Rather, they are quite irritated when search listings (and the corresponding web pages) do not meet their expectations. And they become increasingly irritated when the same site appears over and over again for multiple searches. They are not only irritated with the website—they are also irritated with the search engine that keeps delivering listings from the same site over and over again. Test participants usually do not blame themselves for formulating poor search queries. They often blame the website owner and the search engine.
The user experience does not come from a brand manager’s perspective, a marketing manager’s perspective, or even a search engine’s perspective. The user experience comes from the users’ perspective. Search marketers would do well to keep this point in mind when doing SEO.
User experience and interaction
Also at the core of the user experience is interaction. How do site visitors interact with a website? Are web pages sticky, encouraging site visitors to continue viewing other pages within a website? Is a page’s bounce rate extremely high? Do all web pages need to be sticky?
Web analytics data can certainly tell search marketers how site visitors use a website. But this data doesn’t tell us why site visitors do the things they do on a website, what motivates them. In my opinion, it is the combination of the how and the why that delivers the best ROI (return on investment). All too often, search marketers completely miss the boat on the why part because they limit their knowledge of site interaction to web analytics data and general website usability guidelines. They never truly interact with users.
In order to truly evaluate the how and the why of website interaction, you need to:

Put the interface in front of members of your primary target audience,
Objectively observe their behaviors, and
Ask questions about their behaviors without leading them into giving you answers that you want to hear

Focus groups cannot give you this feedback because the focus group leaders are driving the interactions. Web analytics data is not communicating users’ motivations behind their actions. So if I hear a search professional claim that he or she is all about promoting a positive user experience, my questions to them are:

How many usability tests have you done in the past three months?
What are the names of these usability tests?
Without violating client confidentiality, what did you learn from these tests?
What other types of client interaction do you measure, and how?

Of course, I am not saying that search professionals are not “pro” user experience because they do not usability test. However, if I hear the job title of User Experience Designer or User Experience Specialist, I want to know that the person with these job titles actually interacts with users to truly understand their experience.
One of my favorite usability best practices quotations came from Susan Weinschenk from Human Factors International:

“You can apply all usability guidelines to a website and have a completely unusable interface.”

The search experience is a large part of the overall user experience. I laud all search marketers who follow usability best practices during the optimization process, because usability certainly helps with a site’s link development and conversions. But I also know that reading guidelines and following them without actually observing human interaction with optimized websites can lead search marketers and website owners down the wrong path. As part of the SEO and usabality testing process, be sure not to neglect one-on-one interaction with your target audience. You won’t regret it.

Traffic Distribution by Google Ranking

Friday, December 5, 2008 at 4:58 AM Posted by Vasu

SEO Sales Process: Overcoming Common SEO Objections

at 4:55 AM Posted by Vasu

1. Search Engines Will Find Us/We Already Rank

Sure. Under what keyword terms? How much of the site are the spiders missing?
There is a big difference between arbitrary ranking in search engine listings, and ranking for focused keyword terms. Demonstrate to the client the value of appearing under a wide variety of targeted keyword terms, as opposed to this being a random process. It is like the difference between advertising where few people are looking, as opposed to appearing on a string of billboards in prominent locations.
You could do a side by side comparison between the client and a more established competitor using Compete.com graphs. If they already rank for valuable terms, try to get them to track the business derived from those rankings, and show them the upside potential of increasing rank.

2. We'll Have To Redesign Our Site. That Costs Money

Quite possibly.
Try to demonstrate to the client that the potential benefits outweigh the costs. One way to price organic search traffic is to use the PPC prices as a guide. It could also be argued that organic listings have a higher trust level amongst users, making the traffic potentially even more valuable.
So how much is that poor design costing them in terms of lost opportunity?

3. SEO is Expensive

A common objection, usually made because the client can't determine the amount of work required, or the the value added.
Break down the work into separate tasks, and outline how long each task is likely to take. If the client knows your rate per hour, then they will be more able to determine if the cost is fair.
For example:

Industry analysis - research industry sector, marketing and sales trends.
Competition analysis - conduct review of competitor sites
Keyword research - research keyword terms
Site optimization, including title tags, meta tags, copy and internal linking
Link building/directory submission/social media promotion
Monitoring and reporting

Another aspect of this objection has to do with the value proposition. Again, try printing out the PPC bid prices for the same keyword traffic, and show how your work effectively undercuts that price. If you can, try and get information about how much the client spends on other channels, and do a side by side comparison of the relative merits, costs and benefits.

4. Upper management Won't Support It

Perhaps you need to be talking to the decision maker ;)
Ask what upper-managements objections would be? Sometimes this objection is legitimate, but it is often used to avoid having to tell you "no, thanks". The client cites an authority, who isn't present, implying that any further negotiations with the client will prove fruitless.

5. Why Should We Change The Way We Write Just For Search Engines?

This objection is commonly used by copywriters and journalists.
Established writers often use methodologies that don't take into account SEO. One way to get around this objection is to request a trail run on a few test pages. Once you're demonstrated that writing effective copy can result in an increase in visitors and conversions, you'll have more sway when it comes to changing the rest of the site.
Also, appeal to the copywriters vanity. If more people see their work, isn't that a good thing?
Cite "This Boring Headline Is Written for Google", an article about how The New York Times changed their writing practices to accommodate SEO.

"We're all struggling and experimenting with how news is presented in the future," said Larry Kramer, president of CBS Digital Media. "And there's nothing wrong with search engine optimization as long as it doesn't interfere with news judgment. It shouldn't, and it's up to us to make sure it doesn't. But it is a tool that is part of being effective in this medium."

6. SEO Doesn't Work. It's A Scam!

Ask the client why they feel this way. Has the client had dealings with SEOs in the past? Seen some bad press?
Have case studies on hand that demonstrate how you've solved search marketing problems in the past. Also provide recommendations from previous clients who were happy with your work.
Reframe the debate in terms of problems and solutions.

7. We Have A Strong Brand, So We Don't Need SEO

This is true, so long as people only search on the brand.
But what about those searchers who are searching for generic product/service names?
I once had this objection from a well-known childrens' clothes retailer. I ran a few search reports on generic searches, such as kids t-shirt, babywear, etc, and showed the client the traffic numbers. I then showed the client that their site wasn't appearing under any of those terms.
But her competitors were.
Why choose one or the other when you could easily have both?

8. We Like Flash. It's Cool!

Run away. Run fast..... ;)
Seriously though, such objections usually come from designers who place a lot of emphasis on site appearance, or want to play with the latest toys.
In the past, I've approached this in one of two ways. If they want to keep designing in Flash, or other technologies that make crawling and linking difficult, then suggest workarounds that don't affect the design. For example, create a print-friendly version of the site. This is the part of the site that gets crawled and seen by search engines and search visitors, while the designers can still focus on their elaborate designs. Essentially, you create a site within a site.
Show them that their competitors outrank them, in part, by using different technology. Is Flash really worth that competitive disadvantage?
From Google AdWords Blog:

Did you know that 20% of the queries Google receives each day are ones we haven’t seen in at least 90 days, if at all? With that kind of unpredictable search behavior, it's extremely difficult to create a keyword list that covers all relevant queries using only exact match."

It's even harder to capture that traffic using Flash.
BTW: Check out this example. Here is the spider's view of McDonalds.com.

9. Are SEO Services Really That Important?

Compared to.....?
It's an effort vs reward question. Again, if you can demonstrate clear commercial benefits over and above the cost, then "hell yes!". Try to focus on the clients business problems, and be prepared to demonstrate how the SEO spend will solve those problems in cost effective ways.
Those are a few common objections. I'm sure you've heard others. What is important to understand is that not all objections are legitimate. Most are stalling tactics used to delay making a decision. That decision is difficult to make because the client will expose themselves to risk.

Simply by being pre-prepared for objections, you help negate that risk, and can quickly move the client towards make a decision.

SEO News & Search Engine Updates

The Definitive List (75+) of Link Building Techniques in 2008

How to make sure your affiliate program passes PageRank & SEO benefits

Easy: Don’t go through a 3rd party

Easy: Allow deep links

Harder: Consolidate your links

Really clever: Don’t make it look like an affiliate program

Really clever: Intelligent use of cookies

Super clever: Don’t use URL parameters

Googles Page Update Life Cycle

Advanced Website Diagnostics with Google Webmaster Tools

Domain Canonicalization

URL Referrer Tracking

Best Robots.txt Tools: Generators and Analyzers

Leveraging Webmaster tools for SEO Success

Do Search Engines Use Bounce Rate As A Ranking Factor?

Yahoo! Search BOSS

Pagination and Duplicate Content Issues

7 Ways to Tame Duplicate Content

,

Keyword Cannibalization and How to Handle It

Why User Experience Is A Crucial Part Of Good SEO

Traffic Distribution by Google Ranking

SEO Sales Process: Overcoming Common SEO Objections

1. Search Engines Will Find Us/We Already Rank

2. We'll Have To Redesign Our Site. That Costs Money

3. SEO is Expensive

4. Upper management Won't Support It

5. Why Should We Change The Way We Write Just For Search Engines?

6. SEO Doesn't Work. It's A Scam!

7. We Have A Strong Brand, So We Don't Need SEO

8. We Like Flash. It's Cool!

9. Are SEO Services Really That Important?

Blog Archive

Recent Post

Labels

Easy: Don’t go through a 3rd party

Easy: Allow deep links

Harder: Consolidate your links

Really clever: Don’t make it look like an affiliate program

Really clever: Intelligent use of cookies

Super clever: Don’t use URL parameters

,

1. Search Engines Will Find Us/We Already Rank

2. We'll Have To Redesign Our Site. That Costs Money

3. SEO is Expensive

4. Upper management Won't Support It

5. Why Should We Change The Way We Write Just For Search Engines?

6. SEO Doesn't Work. It's A Scam!

7. We Have A Strong Brand, So We Don't Need SEO

8. We Like Flash. It's Cool!

9. Are SEO Services Really That Important?

Blog Archive

Recent Post

Labels

Subscribe To Vasu SEO Blog