If you’re interested and SEO and constantly optimizing your website, you must have heard of index, noindex, follow, nofollow….and wonder what the heck people are talking about? In my SEO audits, for example, I’m extensively looking at the website’s linking structure and analysing the internal links that are followed vs. nofollowed, indexed vs. noindexed. In this article, I’m explaining what exactly are these tags, how to use them, and why they are important in terms of SEO. Keep on reading to learn more …
Why do we use them?
Both “nofollow” and “noindex” are settings you can add to your robots meta tag to control how search engine bots crawl and index your website. In order to understand how and when to use them, you should first understand how search engine bots crawl and index your website pages.
Usually, the bot’s job is to crawl and index as many of your website URLs as possible. Once it has discovered a page on your website, Googlebot will crawl this page and follow all the links on it. It will consequently crawl these newly discovered pages and will follow all the links on them and so on. The crawling process will continue until the search engine’s bot has followed all links and crawled all pages or until it has exceeded the crawl budget reserved for your website. Then the pages that were crawled and analysed by the search engine bot will appear on Search Engine Result Pages (SERPs) for specific search queries and depending on the search engine’s algorithm.
But what if all of your website pages have valuable information to the users or if you have some pages that you don’t want to show to the readers ? That’s when the meta robots tags come in handy.
But you can control and change the way search engine bots crawl and index your websites by adding tags like noindex, nofollow, disallow, etc. to your website pages. In SEO, these tags are used as methods to optimize crawl efficiency and crawl budget.
What is a NOINDEX Tag?
Usually, Google bot will crawl all links found on your website and will index as many URLs as possible. If you don’t give any specific directives to the search engine bots, the website crawling and indexing process will look like this:
But not all of your website pages have value to search engines or to readers. For example, you have:
- taxonomy pages that are automatically created by your CMS,
- back-end code that is only used for the running of the site,
- pages with duplicate content that were created for users’ purposes,
- pages with duplicate content that were automatically created,
- or simply pages that you don’t want to show any more.
Use the Noindex tag when you don’t want to index a specific page.
You can use the NoIndex tag to instruct a search engine not to index a particular web page. By adding a noindex meta tag in the page’s HTML code you prevent this page from appearing in the SERPs (Search Engine Results Pages), but the page is still visible on your website and is still crawled by search engine bots. In other words, your users see the page on your website but can’t find it using Google search.
Two ways to implement the NOINDEX tag:
- You can add it as a part of the HTML code of an individual page (in the meta robots tag)
- You can add it as an element of the HTTP header (in the x-robots-tag).
Both tags have the same effect and you can choose on the method of implementation depending on your website needs and convenience.
Let’s learn more about the different ways to implement the no-index tag.
How to apply NOINDEX in the Robots Meta Tag?
The meta robots tag, a.k.a. “meta robots” is an element of the HTML code of a given page. It appears in the <head> section of each web page. It is used to give directives to search engine crawlers and bots. And “noindex” is one of the many directives you can add to the tag in order to instruct search engine bots.
By adding “noindex” to the meta robots tag, you instruct search engine crawlers not to index this web page. “To not index a page” means that they won’t include this page (distant URL) in their list of search results. You can give this instruction to all web crawlers or only to specific crawlers.
If you don’t want this page to be indexed by any search engine, the meta robots tag of the page should look line this:
<meta name="robots" content="noindex">
The value “
robots” specifies that this directive applies to all web
If you want to prevent only Google web crawlers from indexing this page, you will need to modify the name attribute with the name of the crawler that you are addressing:
<meta name="googlebot" content="noindex">
Because search engines have different crawlers for different purposes, I advise you to use meta name=”robots” instead of naming specific crawlers.
By default, webpages are created to be indexed. Therefore, when you create a new page or post with a content management system like WordPress, you will not see a meta robot tag in the HTML code of the page. As soon as you add a directive to the tag, such as “nofollow”, “noindex”, “disavow”, etc, the robots tag should appear in the <head> HTML code section.
How to apply NOINDEX in the HTTP header (X-Robots tag)?
Another way to instruct a search engine how to treat yuor website pages is by using the <x-robots tag>.
This tag is implemented in the HTTP header of a page. HTTP headers are part of HTTP requests and responses and are intended to ensure communication between the server and client in both directions. In simple words, this is the code that transfers data between a Web server and a browser. HTTP headers are the name or value pairs that are displayed in the request and response messages of message headers for Hypertext Transfer Protocol (HTTP). Usually, the header name and the header value are separated by a single colon.
X-Robots-Tag is one of the many instructions you can include in the HTTP header response for a given page (URL).
To add “noindex” in the x-robots-tag, you’ll need to have access to either your website’s header .php, .htaccess, or server access file. You should also have a good knowledge of the specific server configuration and its x-robots-tag markup. If you want to instruct all crawlers not to index a page, you’ll need to add this line to the HTTP header:
HTTP/1.1 200 OK Date: Tue, 04 March 2020 21:42:43 GMT (…) X-Robots-Tag: noindex (…)
If you want to prevent only Google crawlers from indexing this page, you’ll need to modify it to:
(…) X-Robots-Tag: googlebot: noindex (…)
In fact, any directive that can be used in a robots meta tag can also be specified as an
X-Robots-Tag. Read more about all possible directives here.
The x-robots-tag gives webmaster more flexibility and indexing control. For example, this tag can be used to block indexation of a particular element of a page (video), but not of the entire page itself. It can also be used to give more precise instructions to bots.
What is a NOFOLLOW Tag?
Links are an important part of search engine optimization. It is common knowledge that links from external websites will help increase your domain authority, credibility, and website rankings. But internal links are very important too! They help crawlers navigate through your website, discover webpages and transfer link juice (ranking power) between your pages.
The “nofollow” is another directive you can give to a search engine crawler. This tag instructs search engines not to follow links on a specific page and not to pass link value to the pages these links are pointing to.
Use the NOFollow tag to prevent search engines from following links and passing link juice through these links.
It is also used when you do not trust or cannot vouch for the content of the link being linked to.
Keep in mind that while “nofollow” is intended not to provide an SEO boost to the linked content, it will also affect the way seach engine bots crawl and index your pages. For example, in the illustration below, Googlebot couldn’t crawl Product Pages E, F, I and G, because of the no follow tag on Category Pages B and D. If, the Product Pages E, F, I and G don’t have any other incoming links pointing to them (internal or external), the bot may never discover and crawl these pages. Therefore, these product pages won’t be indexed because they can not be visited and crawled by Googlebot.
Three ways to implement the NOFOLLOW tag:
- You can add it to the meta robots tag (as a part of the HTML code of a page),
- You can add it to the X-robots-tag (as an element of the HTTP header), or
- You can add it as an attribute to an individual link.
While the first two will give the exact same instructions to search engines and will affect all links on a given page, the third method of implementing the tag is meant to only affect selected links.
Let’s learn more about the different ways to implement the no-follow tag.
How to apply NoFollow in the Robots Meta Tag?
Exactly as with the Noindex, Nofollow can be added as a directive in the robots meta tag.
When you choose this implementation, you instruct search engine bots, (most notably Googlebot) ro refrain from crawling the links on this page and not to pass link equity through any links on the given webpage. The implementation in a meta tag is exactly the same as explained above, only the directive “noindex” is changed to “nofollow”.
If you want that the nofollow rule applies to all search engine crawlers, you can add:
<meta name="robots" content="nofollow">
If you want to apply the directive only to Google web crawlers:
<meta name="googlebot" content="nofollow">
If you want to use both nofollow and noindex, together on the same page:
<meta name="robots" content="noindex, nofollow">
How to apply NOFOLLOW in HTTP header tag?
Another way to tell the search engine not to follow links and pass link juice is by adding “nofollow” as an additional element of the HTTP header response for a given URL.
Here’s an example of an HTTP response with a
X-Robots-Tag instructing crawlers not to follow the links on a given page:
X-Robots-Tag: googlebot: nofollow
How to apply NOFOLLOW as a link attribute?
By default, all links to a webpage are set to be followed by crawlers and link juice is transferred between the pages. You set a selected link to “nofollow” if you want to suggest to Google that the hyperlink should not pass any link equity/SEO value to the link target. In a different to the two previous implementations if you add “nofollow” as an attribute to the hyperlink, the directive will only affect the link it is applied to and not all the links on the page.
Here’s what the hyperlink code will look like:
<a href="http://www.example.com/" rel="nofollow">Anchor Text</a>
Keep in mind that “nofollow” links are intended not to provide an SEO boost to the linked content, the links are still valuable for user experience and referring traffic.
When to use NOFOLLOW and NOINDEX?
What pages should be set to “noindex”
Thank you pages
E-commerce and lead generation websites usually direct users to a confirmation page (also called Thank you page), once a user completes a transaction or submits a form. The number of visits and hits to these unique thank you pages is the easiest way to track goals and conversions on websites. Therefore, your visitors should arrive on your thank you page only after they’ve made a successful transaction or completed the form. Applying a “no index” tag to your thank-you pages will prevent these pages from being indexed by search engines and avoid visits from SERPs.
Members only pages
It’s a common practice to “noindex” sections of your website that are accessible only to employees, website members, or clients. In general, all pages that shouldn’t be seen by the general public should be set to “noindex’ to keep those pages from being found in SERPs.
Admin and login pages
Your back end and admin log-in page pages should also be “noindex”. Similar to the members-only pages, they should not be found in search engines.
Internal search results
If your website has an internal site search box, you should make sure that search engines don’t index the search result pages. Internal search engines can create a lot of pages that are of little to no value to your visitors. If these pages can be found in Google’s index, you risk providing a bad user experience to anyone who discovers your website via a search result page.
Usually, the internal search pages are excluded (disallowed) via the robots.txt file.
What links that should be set to “nofollow”?
Links in blog comments
Because of the large number of link spammers, it’s normal to automatically nofollow links in blog comments and forums. You’ve probably noticed that a big part of the blog comments you receive are not related to the topic, unhelpful and contain a link to another website. These comments are only made for link-building purposes. Today, most blogs automatically add the nofollow attribute to links in comments and spammers can’t gain SEO value out of them.
Paid links in sponsored articles
Nowadays, linkbuilding is a full-time job and people spend hours contacting websites and offering them money in exchange for backlinks. But while blogger outreach and writing sponsored articles are legit marketing techniques, it can quickly become risky when you do it on a large scale. Google has specifically advised applying the nofollow attribute to external links in “Advertorials or native advertising where payment is received for articles that include links that pass PageRank.” That’s why many websites choose to nofollow all links in sponsored content. Of course, if you do that, you should inform your link-exchange partners beforehand.
Paid links in banners and display ads
Another popular black hat SEO tactic is the mass purchasing of links across the web. To discourage advertisers from purchasing ads because of Page Rank value you should add no-follow to the links going to your advertisers’ sites.
Also, as mentioned above, Google’s guidelines specifically state that “native advertising and sponsored links should be set to “nofollow”.
Links to clients and partners logos
Websites that want to be cautious about their rankings and page rank values should also add “nofollow” to the links of their partners and clients. In this way you indicate to Google that they cannot vouch for each organization’s website that you are linking to. More importantly, you optimize your link juice flow between your own website pages and you avoid “losing” linking power by transferring it to other websites. At the same time, your users can still see and visit the websites of your partners.