javascript - How to prevent google from indexing <script type="applicationjson"> content

I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json"> tag. This json is later parsed and used on the client side.

The problem is that the json contains paths that are not valid links, and Google is treating them as links, and so it is trying to crawl them and getting a steadily increasing amount of 404s, and thus increasing unnecessary crawler traffic.

What can I do to prevent google from attempting to crawl these paths? I can add some patterns to robots.txt, but I want to ensure that google is ignoring the contents of the script tag entirely, and not trying to parse it for paths that look like links.

Share Improve this question asked Nov 9, 2017 at 20:03 undefined 6,5064 gold badges52 silver badges59 bronze badges

2 Have you edited/created a robots.txt file to tell the bots to ignore those paths? – Patrick Evans Commented Nov 9, 2017 at 20:06
2 Are they in your robots.txt file? – zero298 Commented Nov 9, 2017 at 20:06
@PatrickEvans and zero298, see my third paragraph. – undefined Commented Nov 9, 2017 at 20:29

Add a ment |

2 Answers 2

Sorted by: Reset to default 4

Try this markup:

<!--googleoff: all-->
<script type="application/json">
  // your json content here
</script>
<!--googleon: all>

As written in this post.

Plus few more articles:
Preparing for a Crawl
FAQ - How do i use the googleon/googleoff Tags?

PS:

For even more secure way: when possible,
try to use content, generated "on-fly" such as ajax loading.

I'd try something like this:

<script type="application/my-binary-format">
{"urlLikeThing":"//some/path/like/string"}
</script>
<script>
// if (navigator.userAgent !== ...)
document.querySelectorAll('script[type="application/my-binary-format"]').forEach(s => s.setAttribute('type', 'application/json'))
</script>

On the server side, script is created with some content type that is ignored by google bot: application/my-binary-format, application/octet-stream or something alike
After that, you inline into your page an additional script, which searches for scripts with your special content-type and changes it to normal
This script could check for google bot (e.g. via user agent) and perform its actions only for real users

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - How to prevent google from indexing <script type="applicationjson"> content - Stack

2 Answers 2

与本文相关的文章

评论列表(0)