最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to prevent google from indexing <script type="applicationjson"> content - Stack

programmeradmin1浏览0评论

I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json"> tag. This json is later parsed and used on the client side.

The problem is that the json contains paths that are not valid links, and Google is treating them as links, and so it is trying to crawl them and getting a steadily increasing amount of 404s, and thus increasing unnecessary crawler traffic.

What can I do to prevent google from attempting to crawl these paths? I can add some patterns to robots.txt, but I want to ensure that google is ignoring the contents of the script tag entirely, and not trying to parse it for paths that look like links.

I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json"> tag. This json is later parsed and used on the client side.

The problem is that the json contains paths that are not valid links, and Google is treating them as links, and so it is trying to crawl them and getting a steadily increasing amount of 404s, and thus increasing unnecessary crawler traffic.

What can I do to prevent google from attempting to crawl these paths? I can add some patterns to robots.txt, but I want to ensure that google is ignoring the contents of the script tag entirely, and not trying to parse it for paths that look like links.

Share Improve this question asked Nov 9, 2017 at 20:03 undefinedundefined 6,5064 gold badges52 silver badges59 bronze badges 3
  • 2 Have you edited/created a robots.txt file to tell the bots to ignore those paths? – Patrick Evans Commented Nov 9, 2017 at 20:06
  • 2 Are they in your robots.txt file? – zero298 Commented Nov 9, 2017 at 20:06
  • @PatrickEvans and zero298, see my third paragraph. – undefined Commented Nov 9, 2017 at 20:29
Add a ment  | 

2 Answers 2

Reset to default 4

Try this markup:

<!--googleoff: all-->
<script type="application/json">
  // your json content here
</script>
<!--googleon: all>

As written in this post.

Plus few more articles:
Preparing for a Crawl
FAQ - How do i use the googleon/googleoff Tags?

PS:

For even more secure way: when possible,
try to use content, generated "on-fly" such as ajax loading.

I'd try something like this:

<script type="application/my-binary-format">
{"urlLikeThing":"//some/path/like/string"}
</script>
<script>
// if (navigator.userAgent !== ...)
document.querySelectorAll('script[type="application/my-binary-format"]').forEach(s => s.setAttribute('type', 'application/json'))
</script>
  • On the server side, script is created with some content type that is ignored by google bot: application/my-binary-format, application/octet-stream or something alike
  • After that, you inline into your page an additional script, which searches for scripts with your special content-type and changes it to normal
  • This script could check for google bot (e.g. via user agent) and perform its actions only for real users
发布评论

评论列表(0)

  1. 暂无评论