最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

string - Is Javascript substring virtual? - Stack Overflow

programmeradmin2浏览0评论

If we have a huge string, named str1, say 5 million characters long, and then str2 = str1.substr(5555, 100) so that str2 is 100 characters long and is a substring of str1 starting at 5555 (or any other randomly selected position).

How JavaScript stores str2 internally? Is the string contents copied or the new string is sort of virtual and only a reference to the original string and values for position and size are stored?

I know this is implementation dependent, ECMAScript standard (probably) does not define what's under the hood of the string implementation. But I want to know from some expert who knows V8 or SpiderMonkey from inside well enough to clarify this.

Thank you

If we have a huge string, named str1, say 5 million characters long, and then str2 = str1.substr(5555, 100) so that str2 is 100 characters long and is a substring of str1 starting at 5555 (or any other randomly selected position).

How JavaScript stores str2 internally? Is the string contents copied or the new string is sort of virtual and only a reference to the original string and values for position and size are stored?

I know this is implementation dependent, ECMAScript standard (probably) does not define what's under the hood of the string implementation. But I want to know from some expert who knows V8 or SpiderMonkey from inside well enough to clarify this.

Thank you

Share Improve this question edited Dec 12, 2013 at 6:55 Paul Draper 83.2k52 gold badges212 silver badges301 bronze badges asked Dec 12, 2013 at 6:40 exebookexebook 33.9k40 gold badges151 silver badges241 bronze badges 1
  • blog.mozilla.org/javascript/2014/07/21/… – Bergi Commented Apr 5, 2017 at 7:06
Add a comment  | 

2 Answers 2

Reset to default 19

AFAIK V8 has four string representations:

  1. ASCII
  2. UTF-16
  3. concatenation of multiple strings
  4. slice of another string

Adventures in the land of substrings and RegExps has great explanations and illustrations.

Thus, it does not have to copy the string; it just has to beginning and ending markers to the other string.

SpiderMonkey does the same thing. (See Large substrings ~9000x faster in Firefox than Chrome: why? ... though the answer for Chrome is outdated.)

This can give real speed boosts, but sometimes this is undesirable, since it can cause small strings to hold onto the memory of the larger parent string (V8 bug report)

This old blog post of mine explains it, as well as some other string representation forms: https://web.archive.org/web/20170607033600/http://blog.cdleary.com:80/2012/01/string-representation-in-spidermonkey/

Search for "dependent string". I think I know what you might be getting at with the question: they can be problematic things, at times, because if there are no references to the original, you can keep a giant string around in order to keep a bitty little substring that's actually semantically reachable. There are things that an implementation could do to mitigate that problem, like record information on a GC-generation basis to see if such one-dependent-string entities exist and collapse them to their minimal size, but last I knew of that was not being done. (Essentially with that kind of approach you're recovering runtime_refcount == 1 style information at GC-sweep time.)

发布评论

评论列表(0)

  1. 暂无评论