3 Steps to Remove Pages and Cached Content from Google Search

Google’s indexing of websites for its search is quite good, quite fast, and quite thorough. There are times, however, that pages on a site should not be indexed. For example, admin sections or include files. Asking Google to remove content is easy and Google’s response is reasonably fast.

If you don’t own the website that has the material that needs to be removed, don’t bother reading any further. Contact the website responsible for posting it and ask them to remove it.

Cached content is handled through Webmaster Tools. To remove cached content all the steps below are required.

It should be noted that you can could add a meta tag to pages that should not be indexed as such:

<meta name="robots" content="noindex">

or for Google only

<meta name="googlebot" content="noindex">

And for cached pages:

<meta name="robots" content="noarchive">

or for Google only

<meta name="googlebot" content="noarchive">

The meta tag method can be tedious for a large number of files and there is no direct feedback from Google on the removal. For expedited removal and assurance from Google that it has been removed from the search results, follow the steps below.

A note about removing non-HTML content from Google:

If the changed content is not in (X)HTML (for example if an image, a Flash file or a PDF file has been changed), you won’t be able to use the cache removal tool. So if it’s important that the old content no longer be visible in search results, the fastest solution would be to change the URL of the file so that the old URL returns a 404 HTTP result code and use the URL removal tool to remove the old URL. Otherwise, if you chose to allow Google to naturally refresh your information, know that previews of non-HTML content (such as Quick View links for PDF files) can take longer to update after recrawling than normal HTML pages would.

Step 1: Webmaster Tools

Sign up the site for Webmaster Tools. Webmaster Tools is easy to set up and provides valuable information on Google’s indexing of your site. There are also tools to help you create the robots.txt needed for step 2.

Step 2: Robots.txt

Create a robots.txt file and place it in the root of the website. There are two rules that should be applied in the robots.txt:

User-agent: the robot the following rule applies to
Disallow: the URL you want to block

User-Agent

User-agent allows for the specific selection of search engine robots.

For all search engine bots, the user-agent will be:
User-agent: *

For Google bots use:
User-Agent: Googlebot

All rules under the User-Agent apply to that User-Agent. You can specify rules for individual User-Agents and list them on the same robots.txt.

Disallow

Disallow blocks certain sites, directories, pages, images, or file types.

Block entire site:
Disallow: /

Block directory:
Disallow: /junk-directory/

Block page:
Disallow: /private_file.html

Block specific image from Google image search:
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

Block all images on your site from Google image search:
User-agent: Googlebot-Image
Disallow: /

Block files of a specific file type (for example, .gif):
User-agent: *
Disallow: /*.gif$

A sample robots.txt to block all files in an include directory might look like this:
User-agent: *
Disallow: /includes/

Rules on the robots.txt can be stacked and as specific as you want. For example, to exclude a directory and file from all search engine bots and images from Google:
User-agent: *
Disallow: /includes/
Disallow: /admin.html

User-agent: Googlebot-Image
Disallow: /

All rules below a User-Agent declaration apply to the User-Agent specified only.

Step 3: Remove URLs/Cached Pages

In Webmaster Tools request the removal of URL from their index. On Webmaster Tools new interface Remove URLs is under Site Configuration, Crawler Access. For the old interface it’s under Tools, Remove URLs.

Select whether URLs, directories/subdirectories, entire site, or cached copies should be removed. Hit “Next” and put in the URL. If you are removing a cached copy and the page still exists, you will need to add a noarchive meta tag.

<meta name="robots" content="noarchive">

If the page no longer exists, select “This page has been modified so that it no longer contains the information that is being cached.”

The removal will be listed as pending until Google successfully removes it. If for some reason it can’t remove the URLs, Google will list the error on this page. Google can remove the URLs in a matter of hours if done correctly. Keep in mind that Google’s documentation states that it can take 3-5 days.

Other search engines will respect the robots.txt but to varying levels. No two search engines are the same and neither is their handling of the robots.txt.

No related posts.

11 Comments

  • December 19, 2009 - 5:02 am | Permalink

    i want to remove my previous cached list from google search.

  • March 26, 2010 - 3:34 am | Permalink

    I have been trying to remove cached pages for days, but nothing happens. Google Webmaster Tools tell me the pages have been processed but they are still there when I use the search engine. I have tried pretty much anything, using robots, tools, meta noarchive, but removing cached pages doesn't seem to be so easy.

  • March 26, 2010 - 12:36 pm | Permalink

    @Giuliastro
    Make sure Google is picking up your robots.txt. You can get confirmation in Webmaster Tools. Also, check the robots and URL removal paths and verify that they are correct.
    You can review instructions directly from Google at:
    http://bit.ly/dopCZ9

  • August 12, 2010 - 7:12 pm | Permalink

    Thank you for the info. Do anyone knwo what's the following code means and how it relates to clearing google cache?

    http://services.google.com:8882/urlc…&lastcmd=login

    Thank you

  • December 21, 2010 - 9:18 pm | Permalink

    My problem is slightly different. I have been patiently waiting for google to UPDATE my cached reviews so that my most recent good reviews overshadow a bad one made months ago. It doesn't happen. The old review is like the Energizer bunny, it keeps coming and coming and coming, no matter how many good reviews I get. How can I get google to refresh my reviews? I can't wait months for this as the one bad review made months ago is killing me.

  • February 6, 2011 - 10:58 pm | Permalink

    I am very new to bolgging and created http://www.homevoideobiz.blogspot.com in Blogger. Purposly go to web master tools for reaching to more people.
    Now i am encountered the problem of robot.txt file and got the messages as under form google.

    "Issues Google encountered when crawling your site."

    URL restricted by robots.txt Nov 9, 2010
    http://homebizvideo.blogspot.com/search/label/networkmarketing
    URL restricted by robots.txt Feb 2, 2011
    http://homebizvideo.blogspot.com/search/label/news
    URL restricted by robots.txt Feb 1, 2011
    http://homebizvideo.blogspot.com/search/label/phillipines
    URL restricted by robots.txt Nov 14, 2010
    http://homebizvideo.blogspot.com/search/label/phillipines
    URL restricted by robots.txt Feb 3, 2011
    http://homebizvideo.blogspot.com/search/label/prospecting
    URL restricted by robots.txt Jan 28, 2011
    http://homebizvideo.blogspot.com/search/label/sulit.com
    URL restricted by robots.txt Nov 14, 2010
    http://homebizvideo.blogspot.com/search/label/sulit.com
    URL restricted by robots.txt Jan 22, 2011
    http://homebizvideo.blogspot.com/search/label/technology
    URL restricted by robots.txt Nov 19, 2010
    http://homebizvideo.blogspot.com/search/label/technology
    URL restricted by robots.txt Jan 28, 2011
    http://homebizvideo.blogspot.com/search/label/video conferencing
    URL restricted by robots.txt Jan 25, 2011
    http://homebizvideo.blogspot.com/search/label/video webcast
    URL restricted by robots.txt Nov 17, 2010
    http://homebizvideo.blogspot.com/search/label/video webcast
    URL restricted by robots.txt Feb 3, 2011
    http://homebizvideo.blogspot.com/search/label/videomials
    URL restricted by robots.txt Feb 2, 2011
    http://homebizvideo.blogspot.com/search/label/videomials
    URL restricted by robots.txt Nov 11, 2010
    http://homebizvideo.blogspot.com/search/label/videowebcast
    URL restricted by robots.txt Feb 1, 2011
    http://homebizvideo.blogspot.com/search/label/videowebdesigner
    URL restricted by robots.txt Jan 27, 2011
    http://homebizvideo.blogspot.com/search/label/videowebdesigner
    URL restricted by robots.txt Nov 18, 2010

    AND MY SETTING AT CRAWLER ACCESS SHOWS AS UNDER:

    User-agent: Mediapartners-Google
    Disallow:

    User-agent: *
    Disallow: /search

    Sitemap: http://homebizvideo.blogspot.com/feeds/posts/default?orderby=updated

    PLEASE HELP ME TO UNDERSTAND THAT WHETHER MY BLOG IS BEING RESTRICTED BY SUCH ROBOT.TXT AND IF YES HOW CAN I REMOVE IT , THOUGH IN BLOGGER I HAVE SET MY BLOG AS AVAILABLE TO PUBLIC AND NON PRIVATE.
    PLEASE GUIDE ME HOW TO GO ABOUT?
    REGARDS

  • Jack
    July 12, 2011 - 8:32 pm | Permalink

    It looks like your article was submitted to ‘Open source web magazine” with credit going to a different person: http://portfolio.alteredpixels.net/oswm-issue-2.pdf

  • July 12, 2011 - 9:40 pm | Permalink

    @Jack

    Thank you for the heads up. Very unprofessional on their part. Unfortunately, a common practice among losers and cowards.

    And, they seem to be defunct. I really appreciate you letting me know.

  • Jack
    July 12, 2011 - 9:49 pm | Permalink

    No problem, you might want to check his blog to see if there is anymore of your content on his site.

  • July 13, 2011 - 8:01 pm | Permalink

    @Jack

    Thanks. Did that and I am following up. Hopefully, James Harvey will come clean and just remove the references to my article from his resume and LinkedIn site. And, from his web site:
    http://alteredpixels.net/post.cfm/3-steps-to-remove-pages-and-cached-content-from-google-search

  • December 24, 2011 - 7:24 am | Permalink

    i can’t remove my restricted robots.txt, what i do?

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>