Do you know how to prevent indexing of pages past the home page in WP?
I mean I don't want mysite/page/2
, mysite/page/3
to be indexed.
This is because I use home.php
for my theme, so that page/2
, page/3
are all the same.
Please give me a hint or a code snippet please, I don't want to add another plugin (robots meta).
Do you know how to prevent indexing of pages past the home page in WP?
I mean I don't want mysite/page/2
, mysite/page/3
to be indexed.
This is because I use home.php
for my theme, so that page/2
, page/3
are all the same.
Please give me a hint or a code snippet please, I don't want to add another plugin (robots meta).
Share Improve this question edited Oct 22, 2010 at 6:39 MikeSchinkel 37.5k14 gold badges116 silver badges132 bronze badges asked Oct 22, 2010 at 6:35 RichZenMasterRichZenMaster 112 silver badges5 bronze badges 4- @rich@ - Do you want to limit all search engines or is just Google (and Bing) good enough? – MikeSchinkel Commented Oct 22, 2010 at 6:40
- all search engines, but why does it matter – RichZenMaster Commented Oct 22, 2010 at 6:45
- @rich - Well you can easily "sniff" out a handful of search engines and perform a redirect but it's a lot harder to sniff out every search engine there is, just see this list: thesearchenginelist It's like finding a needle in a haystack vs. proving there isn't one. You can use the meta robots trick but not all spyders behave. What's your reason to do this? I'm not questioning you, it's just easier to give you a solution that meets your objectives if I know what your objectives are. – MikeSchinkel Commented Oct 22, 2010 at 6:56
- To prevent duplicate content penalty. If there are many pages with same content as homepage, it may be an issue. – RichZenMaster Commented Oct 22, 2010 at 7:26
4 Answers
Reset to default 4How exactly are you setting up your home page? I think the problem is with it having unwanted pagination in first place and not that pagination being indexed.
In general robots.txt
file is good way to prevent indexing in bulk. I think it would be following directive in your case (please test it so it doesn't affec pagination in other places):
User-agent: *
Disallow: /page/
If it is because of SEO and the warnings in the Google Search console, these can be ignored. wp / Page2 and so on should still be indexed. The this answer and the article with the answer from google:
For a while, SEOs thought it might be a good idea to add a noindex robots meta tag to page 2 and further of a paginated archive. This would prevent people from finding page 2 and further in the search results. The idea was that the search engine would still follow all these links, so all the linked pages would still be properly indexed.
The problem is that at the end of last year, Google said something that caught our attention: long-term noindex on a page will lead to them not following links on that page. This makes adding noindex to page 2 and further of paginated archives a bad idea, as it might lead to your articles no longer getting the internal links they need.
Because of what Google said about long-term noindex, in Yoast SEO v6.3 we removed the option to add noindex to subpages of archives. Should page 2 and further of an archive have a canonical link to page 1, or to itself? The idea was that you mostly want visitors to end up on page 1 of an archive. That page is usually the most relevant for the majority of users.
Google is very clear now: each page within a paginated series should canonicalize to itself, so /page/2/ has a canonical pointing to /page/2/. This is why you see your paginated archives being indexed.
To learn more about it, you can refer to this article — https://yoast/pagination-seo-best-practices/
If you're trying to prevent duplicate content you should look at the root of the problem. You state that your homepage is using a home.php template, does this include some static text which you're passing to all others pages using the home template? If this is the case either remove it or create a unique home template, which in all honestly should be home.php.
if for whatever reason you want to keep the pages that display the same content as your homepage but using a different URL you can always resort to canonicals.
If you replace the content of your header.php with the following you can specify different headers, one that'll include a canonical that and those that wont.
<?php
if (is_page('1')){
<?php include(TEMPLATEPATH.'/header1.php'); ?>
}
elseif (is_page('2')){
<?php include(TEMPLATEPATH.'/header2.php'); ?>
}
else {
<?php include(TEMPLATEPATH.'/headerdefault.php'); ?>
}
?>
And than you just make sure that you include the canonical which refers to your homepage
<link rel="canonical" href="http://www.yourdomain/" />
This will tell Google what is the appropriate URL to the content it's viewing without resorting to using a Plugin.
Either way this all seems a bit weird and I fear that I am just misunderstanding your request as it doesn't seem to make sense. Are you aware of how duplicate content works? Or is it me who should be heading back to bed.
I fail to see the purpose to willingly creating new pages that contain the same content and than looking for a solution to prevent duplicate content.
I think the robots meta tags are what need adjusting. You want the spiders to go to page 2, and follow the links to your articles, but you don't want it actually indexing that page (since it will change). So in your header.php, find the "robots" meta tag, and change it to the following:
<meta name="robots" content="follow, <?php echo (get_query_var('paged')==1)?'index':'noindex'?>" />
Using a blanket robots.txt will unfortunately cause the spider to not follow the links, and not find the articles that are on the other pages.