最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Prevent Yoast from removing the canonical tag if robots meta tag is set to noindex

programmeradmin2浏览0评论
Closed. This question is off-topic. It is not currently accepting answers.

Your question should be specific to WordPress. Generic PHP/JS/SQL/HTML/CSS questions might be better asked at Stack Overflow or another appropriate Stack Exchange network site. Third-party plugins and themes are off-topic for this site; they are better asked about at their developers' support routes.

Closed 2 years ago.

Improve this question

I am using Yoast on a staging site and I want to have a canonical URL that points to the primary domain.

So that staging.whatever uses canonical whatever

I am replacing the canonical tag using the wpseo_canonical filter.

add_filter('wpseo_canonical', 'force_canonical_domain_replace');

Where the force_canonical_domain_replace() does the replacement. It works fine.

I also set the meta robots tag to noindex, nofollow. Doing:

add_filter( 'wpseo_robots', function( $robots ) {

   return 'noindex, nofollow';

} );

But it seems that Yoast automatically removes the canonical tag when it detects noindex in the meta robots content. How can I prevent that?

This is how I am doing the whole thing:

// Replace domain for any URL
add_filter('wpseo_canonical', 'force_canonical_domain_replace');
function force_canonical_domain_replace($url){

    $current_site_domain = whatever_get_current_domain();
    if('whatever' == $current_site_domain){
        return $url;
    }

    // Replace current domain with whatever in all urls
    return str_replace($current_site_domain, 'whatever', $url);

}

// Make sure that meta robots uses noindex, nofollow if we are not in whatever
add_filter( 'wpseo_robots', function( $robots ) {

    if('whatever' == whatever_get_current_domain()){
        return $robots;
    }

    // Replace string entirely to avoid issues
    return 'index, follow';

} );

// Helper function to safely get the current domain
function whatever_get_current_domain(){
   $parsed = parse_url(home_url());
   return $parsed['host'];
}
Closed. This question is off-topic. It is not currently accepting answers.

Your question should be specific to WordPress. Generic PHP/JS/SQL/HTML/CSS questions might be better asked at Stack Overflow or another appropriate Stack Exchange network site. Third-party plugins and themes are off-topic for this site; they are better asked about at their developers' support routes.

Closed 2 years ago.

Improve this question

I am using Yoast on a staging site and I want to have a canonical URL that points to the primary domain.

So that staging.whatever.com uses canonical whatever.com

I am replacing the canonical tag using the wpseo_canonical filter.

add_filter('wpseo_canonical', 'force_canonical_domain_replace');

Where the force_canonical_domain_replace() does the replacement. It works fine.

I also set the meta robots tag to noindex, nofollow. Doing:

add_filter( 'wpseo_robots', function( $robots ) {

   return 'noindex, nofollow';

} );

But it seems that Yoast automatically removes the canonical tag when it detects noindex in the meta robots content. How can I prevent that?

This is how I am doing the whole thing:

// Replace domain for any URL
add_filter('wpseo_canonical', 'force_canonical_domain_replace');
function force_canonical_domain_replace($url){

    $current_site_domain = whatever_get_current_domain();
    if('whatever.com' == $current_site_domain){
        return $url;
    }

    // Replace current domain with whatever.com in all urls
    return str_replace($current_site_domain, 'whatever.com', $url);

}

// Make sure that meta robots uses noindex, nofollow if we are not in whatever.com
add_filter( 'wpseo_robots', function( $robots ) {

    if('whatever.com' == whatever_get_current_domain()){
        return $robots;
    }

    // Replace string entirely to avoid issues
    return 'index, follow';

} );

// Helper function to safely get the current domain
function whatever_get_current_domain(){
   $parsed = parse_url(home_url());
   return $parsed['host'];
}
Share Improve this question edited Feb 8, 2022 at 13:12 Álvaro Franz asked Feb 8, 2022 at 9:09 Álvaro FranzÁlvaro Franz 1,0901 gold badge9 silver badges31 bronze badges 4
  • Cannot reproduce, sorry. WP 5.8.3, Yoast 18.1 and 29 plugins active in total. The second example (add_filter('wpseo_robots', ..) works and I see <meta name='robots' content='noindex, nofollow' /> on all pages. Did you try deactivating other plugins and see if the behavior changes? – kero Commented Feb 8, 2022 at 9:49
  • @kero - Hmm... do you see meta and canonical, both tags, on all pages? – Álvaro Franz Commented Feb 8, 2022 at 10:10
  • Can you add the full code to both? Only tried with the wpseo_robots filter. – kero Commented Feb 8, 2022 at 11:48
  • @kero Updated the question with the whole code. Whatever.com is the primary domain... and xxxxxx.whatever.com would be anything else. – Álvaro Franz Commented Feb 8, 2022 at 11:52
Add a comment  | 

1 Answer 1

Reset to default 1

Assuming this is an XY problem and the original question is: "How do I block crawlers from indexing my staging site?"

Staging and production should be as close as similar. Having changes in code would be a big "no no" for me (except for some environment variables, e.g. setting WP_ENVIRONMENT_TYPE or db credentials).

Instead, I would suggest setting up the webserver to serve the X-Robots-Tag with noindex. There was some discussion which takes preference if you have both (X-Robots-Tag and <meta name="robots" />), but in my experience it suffices to have X-Robots-Tag: none to not have the sites indexed, even if their own <meta name="robots" /> differs.

  • How to set X-Robots-Tag on apache
  • How to set X-Robots-Tag on NGINX
发布评论

评论列表(0)

  1. 暂无评论