How Do I Spot Cloaked Sites?


Forget the debate about cloaking, I am a bit tired of that anyway. How does one detect some of the cloaking going on around the Web. Follow these instructions:

(1) Download the Firefox Browser
(2) Install it
(3) Download the User Agent Switcher for Firefox/Mozilla while using firefox
(4) Restart the browser
(5) Under Tools --> User Agent Switcher --> Options --> Options (that will open a dialog box)
(6) Click Add Under User Agents section
(7) In the description add "Googlebot" and in the user agent add "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
(8) Repeat this process for all the spiders you want to test. Updated comprehensive list of user agents.
(9) Under Tools --> User Agent Switcher --> select the user agent
(10) Then navigate to the pages that you want to test for cloaking.

Hope this helps some people be Googlebot. :)

Looking ahead: Google announces technology that searches tomorrow's web, today


SYDNEY, AUSTRALIA, 1 April 2008 - Google Australia today announced the launch of gDay™, a new beta search technology that will search web pages 24 hours before they are created.

View the gDay™ page for more information, user testimonials and Q+A.

gDay was developed in Google's Sydney engineering centre and can accurately predict future events and internet content. It does this by using machine learning and artificial intelligence techniques from a system called MATE™ (Machine Automated Temporal Extrapolation).

Using Google's index of historic, cached web content and a mashup of numerous factors including recurrence plots and fuzzy measure analysis, gDay creates a sophisticated model of what the internet will look like 24 hours from now - including share price movements, sports results and news events. Plus, using language regression analysis, Google can even predict the actual wording of tomorrow's blogs and newspaper columns.

Then, to rank these future webpages in order of relevance, gDay uses a statistical extrapolation of a page's PageRank, called SageRank.

Only Australian websites are included in the beta.

“Google's Australian engineers have a history of major technological innovations, from Google Maps™ to Mapplets™ to Traffic for Google Maps. Giving humankind the ability to see 24 hours into the future is just a natural progression – of sorts,” said Alan Noble, Head of Engineering for Google Australia & New Zealand.

“Users – particularly those who like a casual flutter – will really benefit from this feature. Maybe you want to see tomorrow's rugby scores. Maybe you want to see tomorrow's lotto numbers. Maybe this is the greatest product since sliced bread."

See today's post on the Google Australia blog

gDay, MATE, SageRank, PageRank, Google Maps and Google Mapplets are trademarks of Google Inc.

Media contact
Rob Shilkin
Google Australia & NZ
rshilkin@google.com

Dear Google Analytics users,



We are writing to let you know about a change in our service offerings. If you have logged into your account recently, you may have noticed that you can now choose to share your Google Analytics data. By providing data sharing options, we hope to provide you with transparency, control, and new services based on your preferences.

To learn more about data sharing settings, visit our FAQs: http://www.google.com/support/googleanalytics/bin/answer.py?answer=87515

We're also happy to announce industry benchmarking as the first new feature available to those who opt to share their data. Benchmarking lets you compare your metrics against industry verticals.

To enable this optional new feature, an administrator on your account will need to make the following selections on the Google Analytics data sharing settings page:

1. Log into your account. You'll see the yellow data sharing settings box on the Analytics Settings page.

2. Click the "More data sharing options" link within the yellow box.

3. Select the second checkbox to specify that you want to share your data "Anonymously with Google products and the benchmarking service". You can also choose to share your data "With Google products only" to take advantage of advanced Google advertising products and services as they become available.

The industry benchmarking feature is currently in beta. Once you have enabled benchmarking, it may take up to two weeks before the categorized, aggregated and anonymized benchmarking data shows up in your reports.

For more information on the benchmarking service, visit our FAQs: http://www.google.com/support/googleanalytics/bin/topic.py?topic=13909

In addition to the new benchmarking service, opting to share your data will also enable you to take advantage of new advanced Google products and services as they become available. We think these services will offer greater insight and sophistication to users who have opted to share their data. However, if you would prefer not to use these services, simply specify on the settings page that you don't want to share your data.


Sincerely,

The Google Analytics Team

Google Crawls HTML Forms to Index Deep

Crawling through HTML forms

Friday, April 11, 2008 at 10:50 AM



Google is constantly trying new ideas to improve our coverage of the web. We already do some pretty smart things like scanning JavaScript and Flash to discover links to new web pages, and today, we would like to talk about another new technology we've started experimenting with recently.

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a

element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.

Needless to say, this experiment follows good Internet citizenry practices. Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site.

The web pages we discover in our enhanced crawl do not come at the expense of regular web pages that are already part of the crawl, so this change doesn't reduce PageRank for your other pages. As such it should only increase the exposure of your site in Google. This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.

This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.

List of high PR sites for bookmarking

This summary is not available. Please click here to view the post.

Blog Archive