Site icon WPMarmite

Robots.txt: How to optimize this file on a WordPress website

If I ask you to define what is a robots.txt file on WordPress, are you able to give me a straight answer?

Not easy, is it? Besides, without knowing it, you probably already have one on your website.

The thing is, we don’t always understand this famous file. What is it used for? What do you put in it? Why does its code look hard to understand?

If you’ve ever looked into the subject, I bet you’ve asked yourself these questions.

A bit like dynamite, this file must be handled with great care.

If you don’t set it up properly, you risk damaging your site’s SEO. So beware of the explosion!

In this post, I will show you how to avoid the disaster, and how to optimize your WordPress robots.txt file. You will discover what it is used for, how it works, two ways to create it, and what to put inside.

Your best WordPress projects need the best host!

WPMarmite recommends Bluehost: great performance, great support. All you need for a great start.

What is the WordPress robots.txt file?

Presentation

A WordPress robots.txt file is a text file located at the root of your site that “tells search engine crawlers which URLs the crawler can access on your site” according to the definition given by Google on its webmaster help site.

Also referred to as the “Robots Exclusion Standard/Protocol”, it allows search engines to avoid indexing certain useless and/or private content (e.g. your login page, sensitive folders and files).

In short, this protocol tells the robots of a search engine what they can or cannot do on your site.

Here is how it works. When a robot is about to crawl a URL of your site (i.e. it is going to explore and retrieve information to be able to index it), it will first look at your robots.txt file.

If it finds it, it will read it, then follow the directives you have given it (it will not be able to crawl such and such a file if you have forbidden it).

If it doesn’t find it, it will crawl your site in a normal way, without excluding any content.

Look at this example of a WordPress robots.txt file to see what it looks like:

Don’t necessarily stop at its contents. As you’ll see later, there is no standard file that can be adapted to any site. In any case, it is not recommended.

If you had to remember 4 more things about our topic of the day, get this into your head:

  1. As Google explains, the information you provide in your robots.txt file “can’t force the crawler to follow your site’s rules”. If the “serious” crawlers (Google, Bing, Yahoo, Yandex, Baidu, etc.) will respect them, it will not be the case for malicious robots, which seek to undermine the security of your site.
    Moreover, not all robots interpret instructions in the same way, so be sure to respect the syntax indicated by Google.
  2. The robots.txt file is a public file. Anyone can access it by typing the following template: yoursite.com/robots.txt. Therefore, do not use it to hide content, one will quickly find where it is hidden… If you want some content to remain private, don’t put it in this file, but protect it with a password for example.
  3. If you do not want certain pages to appear in search results, “do not use the robots.txt file to hide your web page” says Google. Indeed, if a number of links point to this page, it is possible that Google indexes and displays it in its search results, without knowing what it contains, even if you have blocked it in your robots.txt file.
    To prevent a page from appearing in search results, Google recommends using what is called a noindex tag (it can be easily activated in Yoast SEO by unchecking the box “Allow search engines to show this Post in search results?” located under each post/page in the settings tab).
  4. The robots.txt file has a cousin called humans.txt.
    This is a TXT file, also located at the root of your site, which contains information about the different people who contributed to its design.
    For example, developers, web designers, editors, etc. It is not mandatory, but if you think it is useful to integrate it on your WordPress site, you will have to add it to the root of your site, next to the robots.txt file (look at the one from WPMarmite for example).

Do you really need a robots.txt file?

By default, a website will be crawled and indexed normally by a search engine, even without the presence of a robots.txt file.

The latter is therefore not mandatory. As Daniel Roch, a WordPress SEO specialist, explains, “if you want to index all your pages, content and media, don’t use the robots.txt file: it won’t do you any good”.

But then, what use can this file be, the rest of the time?

The main benefit is to be found on the side of your SEO. In fact, a robots.txt file allows you to save what is called the crawl budget, says this post from the Yoast SEO blog.

It’s pretty technical, but simply put, by de-indexing the pages on your site that are of no SEO interest, you’ll leave more time and energy for Google to crawl the others.

If you want to dig deeper into the subject, Brian Dean, from Backlinko, talks about it here.

Join the WPMarmite subscribers

Get the last WPMarmite posts (and also exclusive resources).

Now it’s time to move on to the configuration of your file. And this is important, believe me. If it’s not properly optimized, you risk seriously penalizing your presence on search engines.

How to create a WordPress robots.txt file?

By default, WordPress creates a virtual robots.txt file. It is not accessible on your server, but you can view it online.

Take the one on Usain Bolt’s site, the former Jamaican sprint star.

Yes, even Usain Bolt’s website is built on WordPress.

To see it, you just have to type in your browser http://usainbolt.com/robots.txt.

Here is what you will get:

Plain Text

This virtual file works. But how do you modify this robots.txt on your WordPress website?

Well, you will have to create your own file to replace it.

There are two ways to do this:

I’ll show you how to do it in detail.

How to create a robots.txt file on WordPress with Yoast SEO

I’m willing to bet you know Yoast SEO, right? You know, it’s an SEO plugin, one of the most downloaded of all time.

WPMarmite uses it, and I’m also going to use it to show you how it can help you create a WordPress robots.txt file.

Of course, the prerequisite is that you have installed and activated this plugin.

Start by going to your WordPress Dashboard, and select Yoast SEO > Tools.

Continue by clicking on “File editor”.

If you don’t have a dedicated file yet, click on the button to create one. I already had one on my site, so I could only edit it. And don’t forget to save, once you’re done.

And there you go.

Don’t worry, I’ll explain at the end of this part what information to put in this file.

For the moment, let’s move on to the second method: you will have to use your little hands.

The manual method

Whether you use a dedicated plugin or not, it is also possible to add a robots.txt file on your WordPress website manually. It’s very simple, you’ll see.

First, you will need a text editor. Among them, I can recommend:

Otherwise, your good old Notepad will also do very well.

Create a new document, and save it on your computer with the name robots.txt.

Its name must always be in lower case, and don’t forget to put an “s” in the word robots (don’t write robot.txt).

Next, connect to your FTP client. This is a software that allows you to communicate with your server.

Personally, I use Filezilla. But you can also use Cyberduck. For more info on how to use an FTP, check out our post: How to use FTP to access your WordPress files.

Also, the FTP will be useful for you in the installation process of WordPress. Read our guide about it: How to install WordPress: a step-by-step guide.

Third and last step: add your file to the root of your site. I repeat, in the root of your site, and not in a subdirectory. Otherwise, search engines will not take it into account.

For example, if your site is accessible via https://www.yoursite.com/, the robots.txt file should be located at https://www.yoursite.com/robots.txt.

This location (the root) may vary from one host to another. At Bluehost (affiliate link), it is called public_html. At OVH, you will find it under the name www.

Its final implementation should look like this, on your site:

The essential rules to know

Congratulations, your robots.txt file is now on your server. For the moment, it is empty, but you can edit it whenever you want.

Logically, you need to ask yourself what kind of instructions to put in there.

Before we get to that, it is necessary to understand the particular syntax of this file.

“Each rule blocks or allows access for a given crawler to a specified file path in that website” as Google explains on its Search Console help.

The two main rules are called:

Let’s study a simple example so that you understand.

Plain Text

On the first line, the asterisk * is what we call a wild-card. It refers to all search engine robots (user-agent).

On the second line, you disallow access to these search engines to all directories and pages of your site, via the slash /.

You don’t need to enter your domain name (e.g. mysite.com/) before the slash, because the robots.txt file uses relative URLs. Simply put, it knows that the slash refers to the root of your domain name.

Obviously, the above code is of little use if you want your site to be crawled and indexed. But it can be useful when you are in the creation phase of your site.

If you don’t want a particular type of robot to crawl your site, for example Yahoo’s (Slurp is the name associated with Yahoo’s robot), you will have to do this:

Plain Text

For more info on robot names, I refer you to this screenshot from the Yoast SEO site.

Some additional rules

I told you about User-agent and Disallow, which are the most used. You should know that there are other syntax rules, but they are not taken into account by all robots (by Google’s, yes). Among them, there are:

To make sure you understand, let’s go a little further by giving you 3 new examples.

How to block access to a directory

Plain Text

I ask all the robots not to explore all the contents of the wp-admin directory.

How to block access to a page or a file

Plain Text

In this example, I ask all robots not to index the WordPress login page, as well as a photo.

You can also see the # symbol appear. It introduces a comment. The text behind it will not be taken into account.

Also keep in mind that the rules are case sensitive.

For example, Disallow: /myphoto.jpg matches http://www.mysite.com/myphoto.jpg, but not http://www.mysite.com/Myphoto.jpg.

How to create different rules for different robots

Plain Text

Rules are always processed from top to bottom. Remember, they always start with the User-agent statement, which indicates the robot to which the rule applies.

In the first one, I ask all robots not to index the login page (wp-login.php).

In the second one, I specifically ask Google’s crawler (Googlebot), not to crawl my whole site.

How to allow access to a file in a blocked directory

Plain Text

We use the Allow statement. In this example, all the wp-admin directory is blocked, except the widgets.php file.

Find the best WordPress experts

Codeable is dedicated to matching you with experts who can help you with anything from WordPress theme design or installation to custom plugin development.

How to check that your robots.txt file is working properly?

To be sure that your file is correctly set up, you can check and validate it on Google Search Console, a free and essential tool to manage the SEO of your site (among others).

Open the robots.txt file testing tool (you need to register your website there first).

Once you have entered the instructions of your choice in the editor provided, you can test your file.

If all is well, you should have the following message at the bottom of the editor.

If not, your file contains logic errors or syntax warnings. Finally, remember to submit the file, by clicking on the “Submit” button.

How to optimize your robots.txt file on WordPress?

What should you put, or not put in your robots.txt file?

Is there a predefined template that can be adapted to each site?

The answer: both yes and no.

Indeed, each site is different and it would be difficult to copy and paste what Peter, Paul or James propose on their sites. Their problems will most likely be different from the ones you have on yours.

Nevertheless, we can give you a basic robots.txt file that will suit most sites:

Plain Text

To tell you the truth, even within the WordPress community, it’s impossible to get everyone to agree. Opinions differ.

Some, like Joost de Valk, the founder of Yoast, advocate minimalism. This is actually the current trend.

In essence, they believe that since Google is able to interpret your site in its entirety (including the CSS and JavaScript code, and no longer only the HTML), it should not block access to CSS and JavaScript files so that it can see your pages in their entirety. Otherwise, it could affect your SEO.

To verify that Google has access to all the resources it needs to display your page properly, you can go back to Google Search Console. Go to the “URL Inspection” tab, click on “View Tested Page” and then click on “Screenshot”.

If your site doesn’t look like it should (e.g. some styles are not applied), it’s probably because some of the rules in your robots.txt file need to be reviewed.

But back to Yoast. Look at their robots.txt file:

Plain Text

As you can see, nothing is blocked!

Others advocate a broader, “safe” approach for your site. They advise, among other things to:

In short, it’s not easy to find your way through all these recommendations!

To sum up, I advise you to:

Conclusion

As you have seen, the robots.txt file is an interesting tool for your SEO. It allows you to tell the search engine robots what they should and should not crawl.

But it must be handled with care. A bad configuration can lead to a total de-indexation of your site (e.g. if you use Disallow: /). So, be careful!

To finish this post, let’s make a summary. Throughout these lines, I have detailed:

Now it’s your turn. Tell me if you use this type of file and how you set it up.

Share your thoughts and feedback in the comments.

Exit mobile version