How should I mark a removed page?
Posted by John Mueller on April 14th, 2007
If a page is missing, it needs to return the code 404 (”not found”). With that code, the server can return any kind of content that the browser will display. The user will still be able to see that, but a crawler will stop at code 404 and not index the rest. There are some really good ideas about how to make a user-friendly 404 page on the web, it makes sense to read up on that.
The 301 redirect serves a different purpose: it moves the visitor and the search engine to a related page. The technical background to this is that it generally takes the search engines less time to process a 301 than to recognize a 404.
A 404 is by definition a temporary error - the content might return in a minute or never come back. The proper code for content that is gone for good is 410 — however, the search engines process this like a 404 anyway. The problem with the 404 is that the search engines (Google in particular) like to keep content in the index as long as it has “value”. This can take a really, really long time. It might change in the future, who knows.
If you need the old page out of the index as fast as possible, it makes sense to use a 301.
Things that should not be done:
- Block the URL with the robots.txt: by blocking the URL from being re-crawled, the search engine will not be able see the server result code and will not know whether the URL is missing (404) or being redirected.
- Block the URL with a robots meta-tag (eg “nofollow, noindex”): If the URL is returning the correct result code (404 or 301) then the search engine will not even see the meta-tag. If the URL is returning an incorrect result code (eg 200) then that needs to be fixed first.
References:
- Server header result code checker (test to see if it returns a code 404)
- More on why you need to return 404 for missing pages
- Google: We’ve detected that your server returns a status of 200 (found successfully) for pages that don’t exist.
- Google: How to remove an outdated link
- Google: Code 404 / Code 410