When working on SEO for a client’s website for the first time, it often becomes apparent that not all web pages are indexed in Google.
As a refresher, if the page isn’t indexed in Google (or another search engine), then it can’t show up in the search results.
Addressing a lack of indexed pages is one of the most basic and critical tasks of SEO.
On several occasions, I’ve experienced the same issue when taking on new SEO work for a client.
The website below, which is more than 10 months old, was professionally built. It has a valid sitemap and robots.txt yet still was only partially indexed in Google.
There are various reasons why not all pages are indexed in Google. These reasons include, but are not limited to:
- Robots.txt file is inadvertently blocking Googlebot’s access to some of your pages
- Meta tags such as “nofollow” instructs Google not to follow links to other pages
- Incomplete sitemap or orphaned pages meaning Googlebot can’t reach all pages
- Crawl budget insufficient- due to the site’s relatively low page rank
- No incoming links to the pages, indicating low importance
- Duplicate content that Google thinks isn’t worthwhile crawling
As a result of one, or a combination of the above factors, Google may just haven’t fully indexed your site. That is the case I’ve experienced with this 10 month old website.
I also have a suspicion (but can’t prove it!) that sites that haven’t had their ownership claimed aren’t fully indexed. Claiming ownership by being added to Google’s webmaster console seems to be a factor in crawling, or lack thereof.
How to investigate the cause of missing pages
To investigate this issue, I first of all check Google Webmaster Tools. I look to ensure there are none of the issues mentioned above stopping Google from indexing the website. This includes any manual actions or crawl errors identified by Google.
In this particular case, there were no crawl errors detected.
The difficulty most SEOs face here, is that Google only supplies a count of those pages submitted and those that are indexed. Trying to find out which of your web pages are indexed in Google, and which aren’t is difficult.
You could run the site:www.example.com command in Google search and get a list of those pages indexed by Google.
From past experience, I find the results aren’t very reliable. It often misses pages that I know are indexed and getting hits, it omits duplicate/similar results also.
I also find that varying the search terms slightly to site:www.example.com products would return different indexed results than without the products kewyword.
I’ve always thought; there has to be a better way!
Then recently everything changed, when I stumbled upon Greenlane Google Indexation Tester! This tool is what i have been searching for!
The tool let’s you enter up to 100 URLs and systematically checks them against Google’s index to see if the pages are indeed stored in Google’s index.
Using the tool, which is a Google docs spreadsheet is extremely simple. You just copy the URLs into the spreadsheet and click run. The output is very clear. It provides a simple Yes, or No as to whether the pages are indexed in Google.
The tool can also import/read your xml sitemap file, which makes things even easier.
The URLs that weren’t indexed, I simply copied the pages from the Google Indexation tester and submitted them into Google’s webmaster tools.
This was achieved by clicking the “Crawl” menu and then “Fetch as Google”. Simply enter the URL of the page you want crawled, click Desktop or Smartphone and then “Fetch and Render”
We can see from the Crawl Stats in Google Webmaster Tools that before April, there was very little crawl activity on this website:
What caused the crawl activity to increase after April, is the fact that I manually submitted the pages to Google that weren’t indexed, as just described.
If you’ve done everything properly, you’ll start to see the number of pages indexed gradually increase over the next few days. It does take a few days (or weeks) for Google to index the missing pages that were originally identified.
The speed is dependent on your “crawl budget” and how many pages google needs to crawl. You should check to ensure your robots.txt is correct and set the canonicalisation of your website to reduce google’s crawl effort.
What I’ve realised, is that un-indexed web pages is a surprisingly common occurrence. It should be one of the first things checked when beginning a SEO campaign on a website, as overlooking this check can be disastrous.
Whilst on the topic, there may be a reason your web pages are indexed and then de-indexed again. Ensure your content is meaningful, unique and linked (internally and externally) to ensure your pages remain indexed in Google.
Thanks to Greenlane for creating this tool! I plan to use it periodically to identify which pages are dropped from an index and when taking on new SEO work. It’s an extremely useful tool and I’m glad i found it!
Do you agree with my approach for identifying any pages not contained in Google’s index? Leave a comment below if you have a better method.
Click the link to read more about the on-page SEO mistakes that are common.
For any assistance with SEO for your website, contact Evolocity.