23 June 2009

Blocking Google

If you have pages with secret, private and confidential information which you want to prevent it from indexed by Google, Yahoo, Msn and other major search engines. Then we have no of options to do so, they are as follows,
  • If you want to maintain and keep your confidential data in your server, then you can place them in a password protected directory. Since Google bot and other spiders won't be able to access the files in those protected directories. For eg., If you'reusing Apache Web Server, you can edit your .htaccess file to password-protect the directory on your server. There are a lot of tools on the web that will let you do this easily.
    This is one of simplest and effective way to prevent bots from crawling and indexing all our secret and confidential informations.
  • By using Robots.txt file to control the access to files and directories in your server. The Robot.txt file tells the bots and other crawlers which files and directories in our server should not be crawled. We need root access of our host in order to use the Robot.txt file.
  • We can also use Noindex meta tags to avoid our site single pages from getting indexed from bots and other spiders. The no index meta tag has to be placed in the header portion of the html page.
  • To block images from getting indexed then use, < meta name="robots" content="noimageindex" > in the head of the page.
Though we use Robot.txt file and meta tags to prevent getting listed in search results, The bots will go and find information from directories and other sites linking to our sites and our page title may get indexed and show up in search result. Hence we strongly recommend to use password protected directories in your server to store your private and confidential informations.Thus we can prevent our confidential and secret informations from getting indexed from bots and other search engines.

0 comments:

Post a Comment