Why are my blog posts not indexed properly?
Posted by John Mueller on June 17th, 2007
Often a blog will not be indexed properly. Sometimes only the front page is indexed, sometimes only a handful of the postings are indexed. Sometimes, visitors are sent to the front page for a posting which is no longer visible there.
There are a few issues with blogs which can sometimes be rectified with amazing results.
Sometimes a blog will be set up to display the full blog posting on the front page. Usually, if this is the case, the same posting will be shown in the category pages and in the archives and here and there and … This will result in two problems:
- The search engines will find the full content in several places and will not really know where to send a visitor. Additionally, the content will suddenly disappear in one place and only remain elsewhere. Usually, the front page and the category pages will have more “value” (linked directly) - getting them indexed and ranking for the content of the blog posting. The search engines will later have to reconsider that, once noticing that the content is no longer there and try to go back to the actual blog page. This makes things more complicated than it needs to be.
- Users will also find the full content in several places. This will lead to them linking to the “wrong” place (on accident), since the content is visible there as well. However, the only sure place the content will remain is the actual posting: the links going to the other pages will not be relevant anymore.
If possible: make sure to use summaries or snippets of your postings everywhere except on the actual posting.
Note: this does not apply to the RSS/Atom feed, since that is usually not indexed (at least it shouldn’t be).
Duplicate content
Most blog systems are amazing with regards to the number of different ways that content can be accessed. As mentioned above already, you can find things on the front page, you can find it in the category pages, you can find it in the archive, you can find it several pages deep in all of those sections AND you can find it on the actual posting. This really makes it hard for a search engine to crawl through and to find the real, valuable URLs that should be indexed.
There are several ways to limit the duplicate content to a minimum:
- Change your blog’s URL structure to keep duplicates to a minimum. This undoubtedly is the best policy, but also usually takes the most work. How do you keep all URLs listed and interlinked without sacrificing usability?
- Keep unnecessary URLs from being indexed with either the robots.txt file or the robots meta-tag. If you do this, make sure that the actual blog postings are still accessable through a normal crawl - for example, it does not help to keep all category pages out of the index if those category pages are the only places where links to the postings can be found.
Some blog systems automatically do this. Some can be manually adjusted to help you with this or allow you to use plugins to do the same.
Everything else
The items listed here are of course only those which are special for the blog-format. Everything else that is important for indexing still matters, especially having enough good, strong links.
For more information:
- SEO_Wordpress plugin for Wordpress
- Permalink Redirect WordPress Plugin for Wordpress
- Duplicate Content Cure Plugin for Wordpress
- Enforce www. Preference for Wordpress
August 22nd, 2007 at 9:25 pm
Great post! Like your blog. I’m covering a similar topic regarding why technorati drops posts from index. I’m going to have to give the Duplicate Content Cure plugin a shot as when I started my blog and did not silo post. If fact some post reside in multiple categories.
August 27th, 2007 at 3:08 pm
Thanks for the post. Very helpful. I think only one page should have the complete article and rest of the pages with snippet of the article and link ‘continue reading the post’ should be enough to remove duplicate articles.