robots.txt- Robots.txt is a file that is used to remove content from the creeping process of spiders / crawlers. Robots.txt is also called the Robots Exemption Protocol.
Why to use robots.txt- In common, we choose that our websites are listed by the google. But there may be some material that we do not want to be indexed & listed. Like the personal pictures directory, website management directory, client's check directory of a web designer, no look for value files like cgi-bin, and many more. The primary concept is we do not want them to be listed.
Is robots.txt file a certain solution- No. Standards based crawlers like Google’s, Yahoo’s or other big look for motor's spiders listen to your spiders.txt computer file. This is because they are programmed to. If designed so, any look for engine optimization bot can ignore the spiders.txt computer file. Result: there is no guarantee.
*How to use robot.txt file?
Spiders.txt computer file has some easy directives which controls the crawlers. These are:
User-agent: this parameter describes, for which crawlers the next factors will be legitimate. * is a wildcard which indicates all crawlers or Googlebot for Search engines.
Disallow: describes which information or information will be omitted. None indicates nothing will be omitted, / indicates everything will be omitted or /folder name/ or /filename can be used to specify the to omitted. Directory name between cuts like /folder name/ indicates that only folder name/default.html will be omitted. Using 1 reduce like /folder name indicates all material within the folder name folder will be omitted.
There are also some other factors which are only backed by all internet explorer. These are:
Allow: this parameter performs just the other of Stop. You can discuss which material will be permitted to be listed here. * is a wildcard.
Request-rate: describes pages/seconds to be listed amount. 1/20 would be 1 web page in every 20 second.
Crawl-delay: describes howmany a few moments to delay after each succesful creeping.
Visit-time: you can determine between which period you want your webpages to be listed. Example utilization is: 0100-0330 which indicates that webpages will be listed between 01:00 AM – 03:30 AM GMT.
Sitemap: this is the parameter where you can display where your sitemap computer file is. You must use the finish URL addres for the computer file.