Earlier this week I noticed that our friendly google-bot web crawler had decided to index my home page without the www prefix. I wasn’t all that concerned as google analytics was showing that everything was fine. The only symptom that was annoying me was the inability for IE7 to pick up the heading font on the home page when using that domain (clicking on the home page link would resolve this). As always (things have changed now..) IE7 flaws were getting ignored, and the focus remained on gathering and producing content. After analysing google’s web master stats I noticed that all was not well. According to this data source I was losing clicks. It occurred that losing my www was doing me great SEO harm.
Sub-domains are important
In the world of domains and SEO a site can be hosted at one physical location but can be accessed through many logical URLs. The prefix www is a subdomain of the domain and may contain different content to it. It is based on this assumption that google indexes both sites and uses one of their algorithms to identify which one is best. In my case, it calculated the domain to be a better choice for my home page than the sub-domain. This had a great effect on my SEO rankings as these are very much like a water pipe. If there are any leaks, your site scoring will decrease.
Let’s add that DNS record
There are a number of approaches to resolve this. The most direct is via your ANAME records. Again, if the registrar is properly equipped to allow you access to the records. You can point mysite123.com to www.mysite123.com this way the problem is solved permanently. Secondly, if you run your own apache based server, a simple container within your configuration files can hold the non www version. Insert in a folder with the same name a .htaccess file that gives a 301 permanent to the www version. And lastly, adding by link rel-canonical tag to the affected pages head will identicate to crawlers which version it should index.
For once, it wasn’t of any benefit losing my www!
Other reasons sources for canonical URL Issues
1. Different domain names serving the same content (302 redirects can make this kind of mess)
2. Different hostnames within one domain, such as “with-www” and “no-www” versions
3. With and without “index.html” for the domain root or a subdirectory root
4. Different protocols – https and http
5. Trailing period on the domain name
6. Double foward slash in the filepath – http://example.com//page.html
7. Swapping the order of query string parameters
8. URL rewrite that allows typos for the “keyworded” virtual directory name
9. Any forum software or CMS that generates alternate urls for the same content
10. URLs that include session parameters, clickpath tracking, etc.
11. Adding a port number to the domain name: example.com:443
12. URLs with unneeded query strings or extra parameters in the query string