When Should You Use Rel Canonical Tag, When To Do Nothing

There are many concerns among website owners, webmasters and fellow SEO consultants that revolve around website behavior and how search engines deal with them.

“Are spambots frequently accessing my site and create performance issues?”
“Are my targeted search engine bots not only visiting my site, but access my top pages?”

Sure, there’s robots.txt we all can use to provide directive to search engine robots, but there are also site structure issues that may complicate how search engine crawlers scan pages and access content. For a site that’s huge, has canonical issues, selected HTTPS pages, crawling agents can get lost in the way.

We all are aware that before a page gets ranked on search results, it has to be indexed. Before it can be indexed, it needs to be crawled and properly processed. If the order is disrupted, or if crawlers retrieve the less desirable URL, that ranking dream might lead to, at least, unwanted results.

While there are several tags designed to provide greater flexibility on how crawlers process web pages, simple miscues can be costly.

Dealing with similar content on product pages, for example — products that vary by color or size can have exactly the same description — can be a challenging situation. Should I treat multitude of pages the same as search results and apply canonical tag? If not all pages are rendered in HTTPS, shoud I only focus on secured ones, noting that Google rankings have tendency to lean towards more secure/user friendly web properties?

So let’s find out how we can make use of some tags to address issues such as similar content, duplicate pages, and the like.

Use of canonical tag
Canonical tags are placed at the header section of the HTML page and contains valuable information for search engines to determine which URL version is the “official” one. By setting a canonical tag on your page, you set the de facto default URL to crawl, leaving other URLs with similar content (HTTPS vs HTTP, www vs non-www, with parameters vs without parameters) a less priority.

With a canonical tag, it becomes straightforward which URL crawlers will confer priority, as they move towards the next page while skipping URLs whose canonical tags point to that “official” one.

Without a canonical tag, search engines are left to decide which URL version to crawl and index for websites that have duplicate content issues. Is it the first URL version they encounter? Is it the one published first? Is it the one with highest link equity or page authority? Furthermore, without a canonical tag, it’s possible that a search engine’s crawl budget on your website gets wasted on URLs that are similar, leaving other important ones unattended.

URL canonical best practices
Even without potential duplicate content issues, there are certain applications of rel canonical that help.

a. It is possible to self-reference a canonical tag
For example, within the header of the URL https://www.website.com/info.html, a canonical tag can appear as as well as on other duplicate page content across other URL versions such as:

http://www.website.com/info.html
http://website.com/info.html
https://www.website.com/info.html?css=false

b. Check canonical tags on CMS-generated or other dynamic pages
It is possible that pages generated on the fly by content management systems will churn out inconsistent canonical tags. Take a detailed look especially on pages of e-commerce, news and other sites powered by CMS.

c. Pre-empt multiple variations of homepage
By setting a canonical URL on your homepage, search engines will understand that even if other websites use different URL variations, something that’s very common — people have been linking to seo-hongkong.com or www.seo-hongkong.com/index.php as well.

d. Make sure you apply rel=canonical consistently
To avoid complications, ensure that you apply rel canonical tags in a consistent manner.

For example, assuming correctURL.html and rightURL.html have the same content:
For the URL http://www.website.com/correctURL.html you apply the following code:
<link rel="canonical" href="http://www.website.com/rightURL.html">

But for the URL http://www.website.com/rightURL.html you apply the following code:
<link rel="canonical" href="http://www.website.com/correctURL.html">

By pointing at each other, there is no established canonical or “default” URL to use as both are of equal footing. In such case, a search engine like Google may have to decide on its own, sometimes against your desired results. To fix the issue above, pick one and adopt it across all URL versions with same content.

For the URL http://www.website.com/rightURL.html you apply the following code:
<link rel="canonical" href="http://www.website.com/rightURL.html">

But for the URL http://www.website.com/rightURL.html you apply the following code:
<link rel="canonical" href="http://www.website.com/correctURL.html">

Using canonical tag, here are a few illustrations when we make this tag useful.

Issue 1: Your website can be accessed in both HTTP and HTTPS versions.
Clearly the issue of duplicate content is evident since simply changing the protocol from secured to non-secured URL (or vice versa) produces the same content.

Solution: It might be tempting to adopt the HTTPS as the default canonical URL if we adopt the assumption that secured pages will eventually outrank their non-secured counterparts, but if existing ranking pages are in the HTTP format, it would be prudent to adopt it as rel canonical URL especially if they have the advantage on inbound links and HTTPS has just been implemented.

Issue 2: Your website’s CMS generates case-sensitive URLs.
Some CMS have this notorious characteristic of churning out case-sensitive URLs. Even though we can all access the same URLs as long as the spelling is correct, search engines see them as different and therefore pose issues especially when inbound links point to a variety of URL versions.

Solution: Apply the rel canonical for URL as you adopt a standard format
A commonly used URL format is the lowercase and applying this on your tag helps establish “official” URLs search engines will honor as they bypass old-fashioned and, in my opinion, dirty looking combination of upper and lowercase letters.

Issue 3: Your website has multiple pages that share common content
For instance, your e-commerce site sells clothing such as shirts and pants, and while each product has its own description, sub pages feature the same product description, only varying parameters like color, price, availability and size. Should you use canonical URL to point towards the default/parent product page?

Solution: These pages are unique enough not to warrant a URL canonical tag and waste the very reason why such sub pages exist. Note that some of the long tail keywords people search online may include modifiers such as color, size or price.

Issue 4: Your website contains pages with multiple content sorting options.

For example, a category product page is composed of hotel rooms sorted by price by default. Another variation of the page displays hotels sorted by ratings, another for stars, and so on. Should I implement rel canonical?

Solution: The content remains the same regardless what the sorting order is, so it makes sense to pick the most likely page people would access (and evaluate hotels more easily) and apply this as default canonical URL.

Conclusion:
If we have a good understanding of what duplicate page/URL is, usage of rel canonical to deal with duplicate content should be pretty straightforward. It should understood as consolidating different URL versions of the same page to achieve consistency and help search engines understand our preference while making good use crawl budget.