Companies are making Fun Robots.txt - Usefullytips
Most boring subject in SEO is Robots.txt. Is there an interesting problem to be solved in this file, maximum erros comes not understanding the directives and typos. Main purpose of a robots.txt is just crawel where they can and cannot go.
Some of Basic thing you need to know in robots.txt
User-agent — specifies which robot.
Disallow — suggests the robots not crawl this area.
Allow — allows robots to crawl this area.
Crawl-delay — tells robots to wait a certain number of seconds before continuing the crawl.
Sitemap — specifies the sitemap location.
Noindex — tells Google to remove pages from the index.
# — comments out a line so it will not be read.
* — match any text.
$ — the URL must end here.
Some of the things you need to know about robots.txt
syntax for robots.txt is domain.com/robots.txt
sub domain should also have it own robots.txt. www.domain.com/robots.txt is not same as domain.com/robots.txt
robots.txt is ignored by crawlers or spiders
Both URL's and robots.txt are case sensitive
You can manage crawlers setting in Google Search Console where crawler-delay was not honoured by Google
according to Google’s Gary Illyes: Allow CSS and JS
- Declare your robots.txt in both Google search console and Bing Webmaster Tools
- Eric Enge of stone temple consulting says that Noindex will work where as John Mueller ( Google webmaster trends Analyst) was opposite for using it, He says better you use noindex via meta robots or x-robots.
- Maximum size of robots.txt file is 500KB
- For avoiding duplicate content don't block crawlers
- Never disallow the pages which are redirected because spiders will not able to follow the redirect
- Disallow pages prevents previous verison from being shown in archive.org
- Go to archive.org and search domain.com/robots.txt, you will seeing older version of robots.txt
Many companies have mention their logo of their brand in robots.txt file here you can see sample examples of robotx.txt
ASCII art and job openings
nike.com, as well know that nike slogan is just do it.. in the same way they mention slogan in their robots.txt file "just crawl it"and also included their logo
Seer robots.txt file
Yelp in their robots.file they have include Asimov's three laws
# As always, Asimov's Three Laws are in effect:
# 1. A robot may not injure a human being or, through inaction, allow a human
# being to come to harm.
# 2. A robot must obey orders given it by human beings except where such
# orders would conflict with the First Law.
# 3. A robot must protect its own existence as long as such protection does
# not conflict with the First or Second Law.
Youtube in their Robots.txt file
# robots.txt file for YouTube
# Created in the distant future (the year 2000) after
# the robotic uprising of the mid 90's which wiped out all humans.
Page One Power in their robots.txt.
#This is not the droid you're looking for.
#but we are the link builders you've been looking for.
In Killer-robots.txt file Both Larry Page and Sergey Brin are safe from Terminators
Reddit Robots.txt File
Whats is Humans.txt?
Humans.txt defines "It's an initiative for knowing the people behind a website. It's a TXT file that contains information about the different people who have contributed to building the website."
Check out : https://www.google.com/humans.txt
Reference link : - http://searchengineland.com/fun-robots-txt-263796
Tags: robots.txt file, what is the maximum size of robots.txt file, seo, search engine optimization, technical seo, robots, robots.txt, robots file