100% ajax websites with 100% search engine compliance can now be created! In action site:gwt.google.com showcase text-entry HTML or Ajax Content that is generated dynamically by Javascript is not indexed by search engines (they only see what View > Source shows). Until now. Google has created a new standard for indexing dynamic ajax content: the HTML snapshot. An HTML snapshot is all the content that appears on the page after Javascript has been executed. Current practice Websites create a "parallel universe" of content: JavaScript-enabled browsers see dynamic content, search engines (and non-JavaScript browsers) see static content created separately. "Progressive enhancement" using Hijax-links is often used. Hijax Gives search engines static links. Gives users (with Javascript enabled) ajax links <a href="ajax.htm?foo=32" onClick="navigate('ajax.html#foo=32'); return false">foo 32</a> Static URL: ajax.htm?foo=32 (with URL parameter) Fragment URL: ajax.htm#foo=32 (used by ajax) Search engines understand URL parameters but ignore fragments, since # really means "a bookmark in a long page that, when clicked, just scrolls down (or up) to that spot". Web developer Jeremy Keith labeled this technique as Hijax. Source: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html Issues * Must maintain two links for every clickable element * Search engines don't serve the fast ajax links Solution: New ajax crawling scheme (by Google) Twitter and Facebook use it! http://twitter.com/#!/georgevanous http://www.facebook.com/#!/georgevanous * Use a specific syntax for ajax urls (called "pretty urls") * Search engines will ask for an "ugly URL" when they see your "pretty URL" * An HTML snapshot for that pretty url is sent * The search engine indexes the content from the HTML snapshop, but displays the pretty url in search results * This way, users always see the pretty url (with the #! hash fragment) and search engines always see the static content An HTML snapshot is all the content that appears on the page after Javascript has been executed. Source: http://code.google.com/web/ajaxcrawling/docs/learn-more.html Step-by-step guide 1. Tell search engines your site supports the new ajax crawling scheme Convert all your hash fragments to #! Before www.example.com/ajax.html#key=value After www.example.com/ajax.html#!key=value 2. Return HTML snapshots for urls with "_escaped_fragment_" To index www.example.com/index.html#!key=value The search engine will ask for www.example.com/ajax.html?_escaped_fragment_=key=value Why? * Hash fragments are never (by design) sent to the server If you enter www.example.com/ajax.html#!key=value The server just sees www.example.com/ajax.html '#' really means "a bookmark in a long page that, when clicked, just scrolls down (or up) to that spot". Since this is only useful to the browser and meaningless to the server, it is not sent to the server. Note: Crawlers escape certain characters in the fragment. Unescape all %xx characters in the fragment (%26 -> &, %20 -> space, %23 -> #, and so on) Create an HTML snapshot If a lot of your content is produced with JavaScript, you may want to use a headless browser such as HtmlUnit to obtain the HTML snapshot. Alternatively, you can use a different tool such as crawljax or watij.com. If much of your content is produced with a server-side technology such as PHP or ASP.NET, you can use your existing code and only replace the JavaScript portions of your web page with static or server-side created HTML. You can create a static version of your pages offline, as is the current practice. For example, many applications draw content from a database that is then rendered by the browser. Instead, you may create a separate HTML page for each AJAX URL. To see what the crawler sees, write a small test application and see the output, or use a tool like "Fetch as Googlebot". To summarize, make sure the following happens on your server: A request URL like www.example.com/ajax.html?_escaped_fragment_=key=value Is mapped back to www.example.com/ajax.html#!key=value The token is URL unescaped. The easiest way to do this is to use standard URL decoding. For example, in Java you would do this: mydecodedfragment = URLDecoder.decode(myencodedfragment, "UTF-8"); An HTML snapshot is returned, ideally along with a prominent link at the top of the page, letting end users know that they have reached the _escaped_fragment_ URL in error. (_escaped_fragment_ URLs are meant to be used only by crawlers.) For all requests that do not have an _escaped_fragment_, the server will return content as before. 3. Handle pages without hash fragments Some of your pages may not have hash fragments. For example, you might want your home page to be www.example.com, rather than www.example.com#!home. For this reason, we have a special provision for pages without hash fragments. To make pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your page. The meta tag takes the following form: <meta name="fragment" content="!"> This indicates to the crawler that it should crawl the ugly version of this URL. As per the above agreement, the crawler will temporarily map the pretty URL to the corresponding ugly URL. In other words, if you place <meta name="fragment" content="!"> into the page www.example.com, the crawler will temporarily map this URL to www.example.com?_escaped_fragment_= and will request this from your server. Your server should then return the HTML snapshot corresponding to www.example.com. Please note that one important restriction applies to this meta tag: the only valid content is "!". In other words, the meta tag will always take the exact form: <meta name="fragment" content="!">, which indicates an empty hash fragment, but a page with AJAX content. 4. Consider updating your Sitemap to list the new AJAX URLs Crawlers use Sitemaps to complement their discovery crawl. Your Sitemap should include the version of your URLs that you'd prefer to have displayed in search results, so in most cases it would be http://example.com/ajax.html#!key=value. Do not include links such as http://example.com/ajax.html?_escaped_fragment_=key=value in the Sitemap. Googlebot does not follow links that contain _escaped_fragment_ If you have an entry page to your site, such as your homepage, that you would like displayed in search results without the #!, then add this URL to the Sitemap as is. For instance, if you want this version displayed in search results: http://example.com/ then include http://example.com/ in your Sitemap and make sure that <meta name="fragment" content="!"> is included in the head of the HTML document. For more information, check out our additional articles on Sitemaps. 5. Test the crawlability of your app: see what the crawler sees with "Fetch as Googlebot". Google provides a tool that will allow you to get an idea of what the crawler sees, Fetch as Googlebot. You should use this tool to see whether your implementation is correct and whether the bot can now see all the content you want a user to see. It is also important to use this tool to ensure that your site is not cloaking. Source: http://code.google.com/web/ajaxcrawling/docs/getting-started.html Fetch as Googlebot Part of Google Webmaster Tools Fetch as Googlebot reflects what Googlebot sees, but it does not follow redirects (Googlebot does). Source: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=158587 |
Notes >