Google Webmasters Help FAQ

Weblog for the Google Webmasters Forum

Where did Google find that wrong URL for my site?

Posted by John Mueller on April 4th, 2007

Google has started to list URLs that were not found while crawling the site. You will find further URLs that were not found within your webserver log-files or statistics. If a URL is used often, it makes sense to track down the source and to correct it (or have it corrected). A URL that is not found in your site will discourage visitors from looking at more of your site.


Finding broken links in your site

The most common source of bad URLs is your own website. These are the links that you should search for and correct regularly - it makes a lot of sense to have your visitors only visit parts of your website that are actually online.

Common broken URLs include:

  • Local file names, eg. F:\Websites\domain\page.htm - yes, this is very common, and embarrassing.
  • Misspelled URLs, eg. http://www.domain.com/paeg.htm; similarly typos in the domain name or protocol, eg. http//www.domain.com/page.htm or http://www.domian.com/page.htm
  • Old pages, eg. http://www.domain.com/obsoletepage.htm - these need to be changed to a current, existing page or removed
  • Incorrect canonicalization, eg http://domain.com/page.htm (though technically not a broken link, it makes sense to fix it)

These broken links are very easy to find with a tool like Xenu’s link sleuth (freeware). It will crawl your website like a search engine crawler and list all found URLs.

Links within javascript on your site

Many sites use javascript, some even use it for navigation (though that is a very bad idea!). Google and some other search engines will try to recognize URLs in javascript and might try to crawl them. Sometimes it works, sometimes it doesn’t.

Example (on www.domain.com/):
<script type="text/javascript">
var some_url = "http://www.domain.com/folder/page.htm";
var other_url = "/folder/page2.htm";
var not_url = "me" + "/you";
<script />

In this case, assuming that the search engine tries to extract a URL from a javascript piece of text, it might try the following URLs:

  • http://www.domain.com/folder/page.htm
  • http://www.domain.com/folder/page2.htm
  • http://www.domain.com/you

Of course the last URL doesn’t make much sense - but it might be a URL like the others. Google will try it at least once - but may discard it soon afterwards. Sometimes it makes sense to hide these URLs better (like in an external javascript file), usually they can just be ignored (once you’ve determined that the source is in the javascript).

Bad links from other websites

Finding broken links from other websites is a hard process - and sometimes not worth the trouble. Sometimes you can find the broken links through the links-listing that Google provides, sometimes through Yahoo’s site-explorer, other times through MSN/Live. In practice it does not make much sense to chase broken links and request changes (thought the owner will often be interested in linking correctly). In many cases the broken link can be fixed on your side by creating a 301-redirect from the broken URL to the correct one.

Comment from Jonathan Simon, Google regarding broken links showing up

I wouldn’t let these 404 errors bother you since they will not affect your site negatively. If you want to you can setup 301 redirects to route anyone following these external 404 links to an appropriate page on your site.

From Google Groups: Google webmaster tools (Apr 18, 2007)

One Response to “Where did Google find that wrong URL for my site?”

  1. Where did Google find that wrong URL for my… Says:

    […] Where did Google find that wrong URL for my… […]

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>