I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json">
tag. This json is later parsed and used on the client side.
The problem is that the json contains paths that are not valid links, and Google is treating them as links, and so it is trying to crawl them and getting a steadily increasing amount of 404s, and thus increasing unnecessary crawler traffic.
What can I do to prevent google from attempting to crawl these paths? I can add some patterns to robots.txt, but I want to ensure that google is ignoring the contents of the script tag entirely, and not trying to parse it for paths that look like links.
I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json">
tag. This json is later parsed and used on the client side.
The problem is that the json contains paths that are not valid links, and Google is treating them as links, and so it is trying to crawl them and getting a steadily increasing amount of 404s, and thus increasing unnecessary crawler traffic.
What can I do to prevent google from attempting to crawl these paths? I can add some patterns to robots.txt, but I want to ensure that google is ignoring the contents of the script tag entirely, and not trying to parse it for paths that look like links.
Share Improve this question asked Nov 9, 2017 at 20:03 undefinedundefined 6,5064 gold badges52 silver badges59 bronze badges 3- 2 Have you edited/created a robots.txt file to tell the bots to ignore those paths? – Patrick Evans Commented Nov 9, 2017 at 20:06
- 2 Are they in your robots.txt file? – zero298 Commented Nov 9, 2017 at 20:06
- @PatrickEvans and zero298, see my third paragraph. – undefined Commented Nov 9, 2017 at 20:29
2 Answers
Reset to default 4Try this markup:
<!--googleoff: all-->
<script type="application/json">
// your json content here
</script>
<!--googleon: all>
As written in this post.
Plus few more articles:
Preparing for a Crawl
FAQ - How do i use the googleon/googleoff Tags?
PS:
For even more secure way: when possible,
try to use content, generated "on-fly" such as ajax loading.
I'd try something like this:
<script type="application/my-binary-format">
{"urlLikeThing":"//some/path/like/string"}
</script>
<script>
// if (navigator.userAgent !== ...)
document.querySelectorAll('script[type="application/my-binary-format"]').forEach(s => s.setAttribute('type', 'application/json'))
</script>
- On the server side, script is created with some content type that is ignored by google bot:
application/my-binary-format
,application/octet-stream
or something alike - After that, you inline into your page an additional script, which searches for scripts with your special content-type and changes it to normal
- This script could check for google bot (e.g. via user agent) and perform its actions only for real users