Taking the cue that if a page is not indexed by search engines, there’s no way it’s ranked, we webmasters try as much as we can to get pages within our websites to be indexed.
We create XML sitemaps, HTML sitemap, inbound links and other methods just to get Google’s attention and establish authority on certain pages. However, even after all our efforts, not all pages are eventually indexed, at least at the time when we check the numbers at Google Search Console.
When this question was raised by someone who owns a website after he found out not all pages he submitted via XML sitemaps were indexed, Google affirmed that it is normal if not all pages are indexed.
“Yes, that’s true. In Search Console we give you information on whether or not, on how many URLs within a sitemap are indexed but not which ones specifically. For the most part, that’s not something you need to worry about, it’s completely normal for us not to index all URLs that we find, and that’s not something you need to artificially inflate.”
Several years ago, at a free seminar organized by BeansBox in Central, I shared a slide displaying what’s the difference between crawling and indexing, I was hoping the concept was clear; once a page is crawled it is actually being scanned for value such as keywords and links. As Google progressed and gained more intelligence in how it perceives content, it is now more equipped to evaluate pages — Penguin update tracks down bad link practices and Hummingbird update better understands the user intent when using queries instead of simply taking them literally.
We should understand that there are instances when certain pages are not indexed. For example, if you have a WordPress powered blog and each post consists of multiple tags that are seldom used elsewhere. Each of these tags shall then have its own landing page which may not offer good amount of information since it only links to a page where the tag was used. Without getting indexed, this page will not rank on search results. If we assume this page potentially delivers large amount of traffic and offers good amount (unique, helpful) information, then by all means make an effort to get this page indexed. However, if we observe this has thin content also found elsewhere, then we can just let this one go.
Instead focus on pages that offer unique content and ensure they are indexed. Checking for the percentage of pages indexed with respect to the total number of pages may not be helpful if you have plenty of pages out there that offer minimal or no value at all. That’s not saying we get rid of WordPress tags, but rather check on the potential of these tags: are they often featured in your content? Are they similar or duplicate? Do they have high query volume and highly relevant to your website? If no, then review the selection process how you use these tags, since they have impact on indexing of pages.
Another area to investigate is whether they are not indexed simply because they were not crawled properly in the first place. Check crawling errors at Google Search Console and check what might be the obstacles that prevent Googlebot from accessing these pages. Maybe page is not loading quick enough, maybe there is misconfiguration on redirection script or other server errors.
For larger sites, XML sitemaps can be split into few smaller parts to help Googlebot crawl them more efficiently and isolate possible issues that are difficult to track if you only use one large sitemap file.