If you are delving into the world of SEO and constantly optimizing your website, you have likely come across terms like index, noindex, follow, nofollow, and wondered about their significance. In this article, I will explain what these tags are, how to use them, and why they are crucial in terms of SEO.
my SEO audits, for example, I’m extensively looking at the website’s linking structure and analysing the internal links that are followed vs. nofollowed, indexed vs. noindexed. In this article, I’m explaining what exactly are these tags, how to use them, and why they are important in terms of SEO. Keep on reading to learn more …
Why do we use them?
Both “nofollow” and “noindex” are settings you can add to your robots meta tag to control how search engine bots crawl and index your website. To understand how and when to use them, it’s important to first comprehend how search engine bots crawl and index your website pages.
The bot’s primary job is to crawl and index as many of your website URLs as possible. Once a page is discovered, the bot will crawl it and follow all the links on it, subsequently crawling the newly discovered pages and following all the links on them, and so on. This process continues until all links have been followed and pages crawled or until the crawl budget reserved for your website is exceeded. The pages that were crawled and analyzed by the search engine bot will then appear on Search Engine Result Pages (SERPs) for specific search queries.
However, there may be pages on your website that hold valuable information for users or pages that you don’t want to show to readers. This is where the meta robots tags come in handy. By adding tags like noindex, nofollow, disallow, etc. to your website pages, you can control and change the way search engine bots crawl and index your websites. In SEO, these tags are utilized as methods to optimize crawl efficiency and crawl budget.
What is a NOINDEX Tag?
Google bot usually crawls all links found on your website and indexes as many URLs as possible. Without giving the bot specific directives, the website crawling and indexing process will look like this:
However, not all of your website pages hold value to search engines or readers. For instance, you may have
- Taxonomy, Tag or Category Pages, created by your content management system (CMS) to organize content but may not add significant value for search engine users,
- back-end code used for the site’s running,
- pages with duplicate content that is replicated across multiple URLs within the website and needs to be consolidated,
- pages containing outdated information or content that is no longer relevant to users or search engines,
- or pages that you no longer want to show.
In these cases, you can use the Noindex tag to instruct a search engine not to index a specific page. By adding a noindex meta tag in the page’s HTML code, you prevent this page from appearing in the SERPs (Search Engine Results Pages), although it is still visible on your website and is still crawled by search engine bots. This means that users can see the page on your website but cannot find it using Google search.
Use the Noindex tag when you don’t want to index a specific page.
Here’s how the website crawling and indexing looks on a website that has a meta robots noindex tag on the following pages:
- Category Pages B and D and
- Product Pages E and H
Two ways to implement the NOINDEX tag:
- You can add it as a part of the HTML code of an individual page (in the meta robots tag)
- You can add it as an element of the HTTP header (in the x-robots-tag).
Both methods have the same effect, and you can choose the method of implementation depending on your website’s needs and convenience. Let’s learn both ways to implement the no-index tag.
How to apply NOINDEX in the Robots Meta Tag?
The meta robots tag, also known as “meta robots,” is an element of the HTML code of a given page. It appears in the <head> section of each web page and can be used to control the behavior of search engine crawling and indexing. By adding the noindex value to the meta robots tag, you can instruct search engine bots not to index a specific page. “To not index a page” means that they won’t include this page (this distinct URL) in their search results. You can give this instruction to all web crawlers or only to specific crawlers.
If you don’t want this page to be indexed by any search engine, the meta robots tag of the page should look like this:
<meta name="robots" content="noindex">
The value “
robots” specifies that this directive applies to all web crawlers.
If you want to prevent only Google web crawlers from indexing this page, you will need to modify the name attribute with the name of the crawler that you are addressing:
<meta name="googlebot" content="noindex">
Because search engines have different crawlers for different purposes, I advise you to use meta name=”robots” instead of naming specific crawlers.
By default, web pages are created to be indexed. Therefore, when you create a new page or a blog post with a content management system like WordPress, you may not see a meta robot tag in the HTML code of the page. As soon as you add a directive to the tag, such as “nofollow”, “noindex”, “disavow”, etc, the robots tag should appear in the <head> HTML code section.
How to apply NOINDEX in the HTTP header (X-Robots tag)?
Another way to instruct a search engine how to index website pages is by using the <x-robots tag>. This tag is implemented in the HTTP header of the page. HTTP headers are part of HTTP requests and responses and are intended to ensure communication between the server and client in both directions. In simple words, this is the code that transfers data between a Web server and a browser. HTTP headers are the name or value pairs that are displayed in the request and response messages of message headers for Hypertext Transfer Protocol (HTTP). Usually, the header name and the header value are separated by a single colon.
X-Robots-Tag is one of the many instructions you can include in the HTTP header response for a given page (URL).
To add “noindex” in the x-robots-tag, you’ll need to have access to either your website’s header .php, .htaccess, or server access file. You should also have a good knowledge of the specific server configuration and its x-robots-tag markup. If you want to instruct all crawlers not to index a page, you’ll need to add this line to the HTTP header:
HTTP/1.1 200 OK Date: Tue, 04 March 2020 21:42:43 GMT (…) X-Robots-Tag: noindex (…)
If you want to prevent only Google crawlers from indexing this page, you’ll need to modify it to:
(…) X-Robots-Tag: googlebot: noindex (…)
In fact, any directive that can be used in a robots meta tag can also be specified as an
X-Robots-Tag. Read more about all possible directives here.
The x-robots-tag gives the webmaster more flexibility and indexing control. For example, this tag can be used to block indexation of a particular element of a page (video), but not of the entire page itself. It can also be used to give more precise instructions to bots.
What is a NOFOLLOW Tag?
Links are an important part of search engine optimization. It is common knowledge that links from external websites will help increase your domain authority, credibility, and website rankings. But internal links are very important too! They help crawlers navigate through your website, discover webpages, and transfer link juice (ranking power) between your pages.
The “nofollow” is another directive you can give to a search engine crawler. The nofollow tag instructs search engines not to follow links on a specific page and not to pass link value to the targe pages these links are pointing to. This means that the linked page will not benefit from the linking page’s authority or PageRank.
Use the NOFollow tag to prevent search engines from following links and passing link juice through these links.
It is also used when you do not trust or cannot vouch for the content of the link being linked to.
Here’s how the website crawling and indexing looks on a website that has a meta robots nofollow tag on Category Pages B and D:
Three ways to implement the NOFOLLOW tag:
- You can add it to the meta robots tag (as a part of the HTML code of a page),
- You can add it to the X-robots-tag (as an element of the HTTP header), or
- You can add it as an attribute to an individual link.
While the first two will give the same instructions to search engines and will affect all links on a given page, the third method of implementing the tag is meant to only affect selected links. Let’s learn more about the different ways to implement the no-follow tag.
How to apply NoFollow in the Robots Meta Tag?
Exactly as with the Noindex, Nofollow can be added as a directive in the robots meta tag.
When you choose this implementation, you instruct search engine bots, (most notably Googlebot) ro refrain from crawling the links on this page and not to pass link equity through any links on the given webpage. The implementation in a meta tag is exactly the same as explained above, only the directive “noindex” is changed to “nofollow”.
To give the directive “nofollow” to all search engine crawlers, use:
<meta name="robots" content="nofollow">
To give the directive only to Google web crawlers use:
<meta name="googlebot" content="nofollow">
To use both nofollow and noindex together on the same page:
<meta name="robots" content="noindex, nofollow">
How to apply NOFOLLOW in HTTP header tag?
Another way to tell the search engine not to follow links and pass link juice is by adding “nofollow” as an additional element of the HTTP header response for a given URL.
Here’s an example of an HTTP response with a
X-Robots-Tag instructing crawlers not to follow the links on a given page:
X-Robots-Tag: googlebot: nofollow
How to apply NOFOLLOW as a link attribute?
By default, all links to a webpage are set to be followed by crawlers and link juice is transferred between the pages. You set a selected link to “nofollow” if you want to suggest to Google that the hyperlink should not pass any link equity/SEO value to the link target. In a different to the two previous implementations if you add “nofollow” as an attribute to the hyperlink, the directive will only affect the link it is applied to and not all the links on the page.
Here’s what the hyperlink code will look like:
<a href="https://www.example.com/" rel="nofollow">Anchor Text</a>
Keep in mind that “nofollow” links are intended not to provide an SEO boost to the linked content, the links are still valuable for user experience and referring traffic.
When Should You Use Meta Robots NoIndex Tags?
While every website is unique, and noindexing pages is very much a “case-by-case” SEO tactic, here are some examples of page types where you should use the Meta Robots NoIndex tag, usually:
Thank you pages
E-commerce and lead generation websites usually direct users to a confirmation page (also called q Thank you page), once a user completes a transaction or submits a form. The number of visits and hits to these unique thank you pages is the easiest way to track goals and conversions on websites. Therefore, your visitors should arrive on your thank you page only after they’ve made a successful transaction or completed the form. Applying a “no index” tag to your thank-you pages will prevent these pages from being indexed by search engines and avoid visits to these pages from SERPs.
Taxonomy, Tag or Category Pages
These are pages created by content management systems (CMS) to organize content but may not add significant value for search engine users. You should review the content on these pages and decide if they are of value to search engine users.
Members only pages
It’s a common practice to “noindex” sections of your website that are accessible only to employees, website members, or clients. In general, all pages that shouldn’t be seen by the general public should be set to “noindex’ to keep those pages from being found in SERPs.
Your back end and admin log-in page pages should also be “noindex”. Similar to the members-only pages, they should not be found in search engines.
However, be extra careful with your User log-in pages. They should be indexed and visible in the SERPs to everyone who does a search with the following keywords “your website + login” or “your brand name + login”.
Archived and Outdated Content Pages
You should noindex pages containing outdated information or content that is no longer relevant to users or search engines.
Internal search results
If your website has an internal site search box, you should make sure that search engines don’t index the search result pages. Internal search engines can create a lot of pages that are of little to no value to your visitors. If these pages can be found in Google’s index, you risk providing a bad user experience to anyone who discovers your website via a search result page.
By the way, it’s recommended to exclude (disallow) the internal search pages via the robots.txt file.
On the other hand, these pages contain important legal information that users may need to find through search engines.
I’ve seen many webmasters who use noindex to their staging websites too. This is a good place to mention that using “noindex” for staging websites is not a best practice because it only prevents the pages from appearing in search engine results, but it doesn’t stop search engine bots from crawling those pages. This means that the staging website can still consume your website’s crawl budget, and sensitive information or unfinished content could potentially be indexed by search engines.
Instead, it is more important to completely block the staging website for crawling via the robots.txt file. By doing this, you explicitly instruct search engine bots not to crawl any part of the staging website, ensuring that none of the content or links on the staging website are accessed and/or indexed. This approach provides a more robust and comprehensive way to keep the staging website hidden from search engines while also conserving your website’s crawl budget for your live, production website.
Your website may have some pages with content that is replicated across multiple URLs within the website and needs to be consolidated. Similar to the tags and category pages created by your CMS, these duplicated pages may be of little value to users. The “noindex” directive is an effective tool for preventing duplicate content from being indexed and shown in search engine results, thereby helping to maintain the quality and relevance of a website’s indexed content.
However, be extra careful when addressing duplicate content issues, and keep in mind that the noindex tag is not the best way to handle them. When it comes to preventing duplicate content, the “canonical” tag is typically used. The canonical tag is employed to address duplicate content issues by specifying the preferred version of a web page when multiple versions of the same content exist. This helps search engines understand which page to index and display in search results, thereby consolidating the ranking signals for the specified canonical page and preventing the dilution of ranking authority caused by duplicate content.
But in case you have multiple URLs with thin content and no URL copies the exact same content as another, it is possible to use noindex.
When Should You Use Meta Robots NoFolow Tags?
As explained above, the “nofollow” tag is used to prevent the bot to folow links and passe link equity from one page to another. Here are some examples of page types that are usually nofollow.
Links in blog comments
Due to the large number of link spammers, it’s normal to automatically nofollow links in blog comments and forums. You’ve probably noticed that a big part of the blog comments you receive are not related to the topic, unhelpful and contain a link to another website. These comments are only made for link-building purposes. Today, most blogs automatically add the nofollow attribute to links in comments and spammers can’t gain SEO value out of them.
Paid links in sponsored articles
Nowadays, linkbuilding is a full-time job and people spend hours contacting websites and offering them money in exchange for backlinks. But while blogger outreach and writing sponsored articles are legit marketing techniques, it can quickly become risky when you do it on a large scale. Google has specifically advised applying the nofollow attribute to external links in “Advertorials or native advertising where payment is received for articles that include links that pass PageRank.” That’s why many websites choose to nofollow all links in sponsored content. Of course, if you do that, you should inform your link-exchange partners beforehand.
Paid links in banners and display ads
Another popular black hat SEO tactic is the mass purchase of links across the web. To discourage advertisers from purchasing ads because of Page Rank value, you should add no-follow to the links going to your advertisers’ sites.
Also, as mentioned above, Google’s guidelines specifically state that “native advertising and sponsored links should be set to “nofollow”.
Links on clients and partners logos
Websites that want to be cautious about their rankings and page rank values should also add “nofollow” to the links of their partners and clients. In this way you indicate to Google that they cannot vouch for each organization’s website that you are linking to. More importantly, you optimize your link juice flow between your own website pages and you avoid “losing” linking power by transferring it to other websites. At the same time, your users can still see and visit the websites of your partners.
In conclusion, understanding and utilizing the noindex and nofollow tags can significantly impact the visibility and indexing of your website’s pages, ultimately influencing your SEO performance. Knowing when and how to make use of the two meta robots tags is pivotal to a robust SEO strategy.
I am a digital marketing consultant and SEO expert based in Hong Kong. With a track record spanning over 11 years, I have helped numerous clients from China, Europe, and around the globe achieve results. In my blog, I share my experience and proven methods in SEO, PPC, and digital marketing strategies.