May

27

3 Steps to Remove Pages and Cached Content from Google Search

Google’s indexing of websites for its search is quite good, quite fast, and quite thorough. There are times, however, that pages on a site should not be indexed. For example, admin sections or include files. Asking Google to remove content is easy and Google’s response is reasonably fast.

If you don’t own the website that has the material that needs to be removed, don’t bother reading any further. Contact the website responsible for posting it and ask them to remove it.

Cached content is handled through Webmaster Tools. To remove cached content all the steps below are required.

It should be noted that you can could add a meta tag to pages that should not be indexed as such:

<meta name="robots" content="noindex">

or for Google only

<meta name="googlebot" content="noindex">

And for cached pages:

<meta name="robots" content="noarchive">

or for Google only

<meta name="googlebot" content="noarchive">

The meta tag method can be tedious for a large number of files and there is no direct feedback from Google on the removal. For expedited removal and assurance from Google that it has been removed from the search results, follow the steps below.

Step 1: Webmaster Tools

Sign up the site for Webmaster Tools. Webmaster Tools is easy to set up and provides valuable information on Google’s indexing of your site. There are also tools to help you create the robots.txt needed for step 2.

Step 2: Robots.txt

Create a robots.txt file and place it in the root of the website. There are two rules that should be applied in the robots.txt:

User-agent: the robot the following rule applies to
Disallow: the URL you want to block

User-Agent

User-agent allows for the specific selection of search engine robots.

For all search engine bots, the user-agent will be:
User-agent: *

For Google bots use:
User-Agent: Googlebot

All rules under the User-Agent apply to that User-Agent. You can specify rules for individual User-Agents and list them on the same robots.txt.

Disallow

Disallow blocks certain sites, directories, pages, images, or file types.

Block entire site:
Disallow: /

Block directory:
Disallow: /junk-directory/

Block page:
Disallow: /private_file.html

Block specific image from Google image search:
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

Block all images on your site from Google image search:
User-agent: Googlebot-Image
Disallow: /

Block files of a specific file type (for example, .gif):
User-agent: *
Disallow: /*.gif$

A sample robots.txt to block all files in an include directory might look like this:
User-agent: *
Disallow: /includes/

Rules on the robots.txt can be stacked and as specific as you want. For example, to exclude a directory and file from all search engine bots and images from Google:
User-agent: *
Disallow: /includes/
Disallow: /admin.html

User-agent: Googlebot-Image
Disallow: /

All rules below a User-Agent declaration apply to the User-Agent specified only.

Step 3: Remove URLs/Cached Pages

In Webmaster Tools request the removal of URL from their index. On Webmaster Tools new interface Remove URLs is under Site Configuration, Crawler Access. For the old interface it’s under Tools, Remove URLs.

Select whether URLs, directories/subdirectories, entire site, or cached copies should be removed. Hit “Next” and put in the URL. If you are removing a cached copy and the page still exists, you will need to add a noarchive meta tag.

<meta name="robots" content="noarchive">

If the page no longer exists, select “This page has been modified so that it no longer contains the information that is being cached.”

The removal will be listed as pending until Google successfully removes it. If for some reason it can’t remove the URLs, Google will list the error on this page. Google can remove the URLs in a matter of hours if done correctly. Keep in mind that Google’s documentation states that it can take 3-5 days.

Other search engines will respect the robots.txt but to varying levels. No two search engines are the same and neither is their handling of the robots.txt.

1 Comment for 3 Steps to Remove Pages and Cached Content from Google Search


taniya kapoor
December 19, 2009

i want to remove my previous cached list from google search.

Leave a comment

Why ask?

 

« | »