A robot.txt file is an instructional text document which tells a Bot not to crawl and index particular pages (directories, folders, subdomains etc.) listed on that document. Although all robots would respect this command, some spammers might ignore any such instruction and access restricted documents. Hence it is advisable to keep your confidential information password protected.
A Robots.txt file has a standard format like this:
Disallow: / readme.html
Always make sure to add a separate disallow function to exclude a separate page. That means you cannot exclude multiple pages using single disallow function. For e.g.
Disallow: /wp-admin, /readme.html is WRONG.
How to create robots.txt file
A robot.txt file can simply made by using two rules as shown above:
A specific search engine robot and the asterix (*) in the field is a special value meaning any robot. If you want a specific robot to not crawl your page, for e.g. Google, use User-agent: Googlebot.
The Disallow function tells the robot which pages on the site should be excluded from crawling. Just add the top level directory URL after the forward slash( /) and make sure you don’t any leave empty lines after the disallow command.
If you want to block the entire site: Just use a forward Slash
If you want to block a specific directory: Just put the directory name followed by a forward slash
If you want to block a page: list the page after the first forward slash
If you want to block a specific image from Google images: type the following
If you want to block all images on your site from Google images: type the following
If you want to block a specific type of file: type the following
Where to put the Robot.txt file
The robot.txt file has to be put into the root of the domain i.e. www.example.com/robots.txt and nowhere else because it’s only there where a robot would find a robot.txt file before he proceeds to crawl pages.
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: “robots.txt”, not “Robots.TXT.
How to check Blocked URLs on your site using Google Webmasters
Go to your Google Webmaster account. On the left panel under the crawl tab you will see the link for blocked URLs. Click and you will get you robots.txt file there.