Skip to main content
robotstxt-feature image

Robotstxt | Clear your site entrance for search engines.

Robotstxt:- I have used a robotstxt word instead of robots.txt. Because a dot (.) is used as a separator in the context of SEO.

Is your post not ranking in search engine result page despite writing a unique content?

We have some tactic to solve your problem and it belongs to robotstxt file of your site.


A few people know about this file.  I am sure that you won’t have any information about it If, you are a new blogger.


Because no such term is discussed during WordPress installation, customization, and blog post writing.

Well, you do not need to panic. We are going to tell you in detail about this. You stay connected with us.

What is robotstxt file?

Let’s understand by an example.

Have you ever gone to a big shopping mall or a building?

If yes, then you may have noticed that on the entrance of such a building or mall all shops and prohibited points are described through drawing.

Have you ever noticed? Why is this done?

You are thinking, to facilitate customers this is done, then you are right.

Now let me explain the role of every entity, here. Here, the customer is playing the role of  Searchbot, Building owner webmaster and the wall painting robotstxt file.

Let’s define the robotstxt file.

It is a small .txt file prepared for instructing the Crawler (Search-bot) by Webmasters,  is usually placed on the root folder of the site.

By the way, this file is a part of the Robots Exclusion Protocol (REP), a group of website standards.

It instructs a Search-bot that all these instructions have to follow during the crawling of a site.

And the search-bot take care of all the instruction from accessing the site content to serving it to the user.

What does this file work?

The most important thing for a search engine is a link. It uses the link for crawling in any site/blog. This work is called spidering.

Searchbot does not spider immediately after accessing any site/blog. It searches for the robotstxt file before spidering.

The search engine knows well, there may be a .txt file. That explains the way of crawling of that site.

The whole information is written on robotstxt file. Search-bot moves forward only after reading this.

Search-bot crawls the entire site/blog if he does not find a robotstxt file in root directory.

Where does it store?

The search-bot first searches the robotstxt file after reaching any site/blog as I mentioned above.

A  search-bot is instructed to search robotstxt file in a particular place. The search-bot search that .txt file at the same place.

And that place is the root directory of all site. You can access the root directory simply by using a URL like www.mysite.com/robots.txt.

Replace mysite with your site name.

Suppose you have placed the robotstxt file elsewhere then Search-bot assumes that robotstxt file is not available on your site. And starts spidering the entire site/blog.

Any other location may be like as www.mysite.com/document/robots.txt.

So keep robotstxt file of your site always on the root directory. So that, Search-bot can find it.

Does this file necessary on every site/blog?

Robotstxt is capable to prevent user agent (search-bot) from accessing the specific part of a site/blog.


It will be clear to you in the next few minutes if you keep reading this post up to end.

I am explaining some situations here where its functionality can be increased by using robotstxt file.

  • To specify the location of the Sitemap.
  • To prevent the Internal search from showing on the SERP.
  • To Prevent Search Engine from indexing certain specific files (PDF, images).
  • To keep a particular section of the website private.

If there is no such part on your site/blog that you want to avoid being public, your site/blog may not need a robotstxt file.

Some terms used in this .txt file.

User-agent:
This term is used to specify search-bot. Like Googlebot, Msnbot etc.
Let’s try to understand this by some example.


User-agent: *
Disallow:
All the directions written below are applied to all search-bots. Search bots interpret it like this.

User-agent: Googlebot
All the directions below it apply to Googlebot. Googlebot interprets it like this.

Disallow:
To give instructions for not crawling all or any particular directory of a site.

Disallow:
All the directories of this site are accessible.


Disallow: /
No directory on this site is accessible.

Disallow: /image/
Image directory of this site is not accessible.

Allow:  

This term is used only for Googlebot because only Googlebot interprets this.

This term is used when a directory has been disallowed for all search-bot. But Googlebot can be instructed to access any file located inside that directory.

User-agent: *
Disallow: /image/
Allow: /image/pink.jpg

This image directory has been disallowed for all search-bots. All other search-bots cannot access any of its files. Only Googlebot can access the pink .jpg file inside this directory.


This term is used only for Googlebot because only Googlebot interprets this.

This term is used when a directory has been disallowed for all search-bot. But Googlebot can be instructed to access any file located inside that directory.

User-agent: *
Disallow: /image/
Allow: /image/pink.jpg

This image directory has been disallowed for all search-bots. All other search-bots cannot access any of its files. Only Googlebot can access the pink .jpg file inside this directory.

How to allow for crawling a complete site?

Often, all webmasters want to get every part of their site crawled.

There may have been a few webmasters who want to avoid crawling some part of their site.

Here are some situations when the search bot crawls the entire site.

(i). The absence of the robotstxt file.

Search-bot starts to search the rorbotstxt file of the site as he reaches to that site, as I have said before.  

He first searches this file in the root directory. if he does not find that file in the root directory, it thinks there is no such file in this site.

There is no such place where I do not have to crawl. And it starts crawling the entire site.

(ii). This file is empty.

He first searches the robotstxt file in the root directory as soon as Search bot reaches to the site.

He gets the .txt file in the root directory in which no statement is written. In such a situation, he thinks there is no restricted part in this site for me.

I have to crawl all the links of this site. And it starts crawling the entire site.

(iii). Having this code in this file.

Search-bot first searches the .txt file in the root directory as soon as he reaches to the site.  He gets the .txt file in which something is written like this.


User-agent: *
Disallow:

He interprets this statement as such. There are no such parts on this site where I have been forbidden to crawl. And it starts crawling the entire site.

How to prepare this file.

It is a text file, as it’s extension is .txt and the instructions written inside it are not written in any complex programming language. So you can easily make it.

For this, you can use either Notepad or any plain text editor. You can use any code editor as well.


Note: – Do not include this code on your robotstxt file.

User-agent: *
Disallow: /

Because this code does not allow any search-bot to crawl any link of the site.

Well, if you want to see your post at the top of the SERP, then you should read this post.

Final Words: –

Before adding or subtracting anything in your robotstxt file, you should be completely sure about the meaning of its code.  It’s my opinion.

Otherwise, it can affect the ranking of your site much more.

A robotstxt file plays a crucial role in presenting your site in search engines like Google. Its ranking can be affected by the search engine if you are not presenting your site well in front of the search engine.

Still, there is doubt in your mind, feel free to comment me. I will definitely reply to your comment.

If you find this post informative, then share it with your loved ones.

Founder , WebtechThoughts

Barun Chandra is technology enthusiast and a blogger. He is fond of technology in depth and writes posts in simple words to make understand easy.

Get Free Email Updates!

Signup now and receive an email once I publish new content.

I agree to have my personal information transfered to MailChimp ( more information )

I will never give away, trade or sell your email address. You can unsubscribe at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *