In search engine optimization lingo, there are different terms we need to understand. If we know what these terms mean, we can easily tell whether someone trying to preach the gospel of search knows what they are talking about, at least on a basic level.
Two of the often used terms in the business are crawling and indexing. These two words are sometimes, if not often, interchanged with one used to represent the other. With their seemingly “interchangeable” nature, many of us may dismiss the difference and declare that both crawling and indexing share the same definition.
So in unison, we ask what is the meaning of crawling, and what is the meaning of indexing, and what’s the difference between the two?
Crawling takes place when there is a successful fetching of unique URIs which can be traced from valid links from other web pages. It’s like Pacman following all those dots and eating them, only that in the case of crawling, it’s the search engine robots that follow the links.
I say crawling takes place on a successful fetching because not all links we see on the Web are crawlable. The following cases may be the reasons why links can’t be crawled:
2. Link was marked for exclusion via robots.txt‘s disallow directive
3. Orphaned link (no one linked to it and the absence of sitemaps.xml that include such link)
4. Link is found within a page that contains the nofollow directive
5. Server was down when link was supposed to be crawled.
Indexing takes place after a crawled URIs are processed. Note that there may be several URIs that are crawled but there could be fewer of them whose content will be processed through indexing. The following reasons could be the causes of non-indexing of a previously crawled page:
1. A noindex directive in the page (<meta name=”robots” content=”noindex” />)
2. Duplicate content: a page that has the same content as with an indexed page may not be indexed.
Other reasons such as link age and link popularity may also play a role but I am less inclined to lean on them.
To check if the page has been indexed, we can use the “site:” operator. For example, if I would like to find out if seo-hongkong.com is indexed I can use site:seo-hongkong.com and Google would show which pages sit currently in its present index.