At the moment i am working on text that is broken into floating columns to display it in a magazine-like
way.
I asked in a previous question how to split
the text into sentences and it works like a charm:
sentences = text.replace(/\.\s+/g,'.|').replace(/\?\s/g,'?|').replace(/\!\s/g,'!|').split("|");
Now i want to go a step further and split it into words. But i do also have some elements in it, that should not be splitted. Like subheadlines.
An example text would be:
A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.
My desired result would look like the following:
Array [
"A",
"wonderful",
"serenity",
"has",
"taken",
"possession",
"of",
"my",
"entire",
"soul.",
"<strong>This is a subheadline</strong>",
"<br>",
"<br>",
"I",
"am",
"alone,",
"and",
"feel",
"the",
"charm",
"of",
"existence",
"in",
"this",
"spot."
]
When i split at all whitespaces i do get the words, but the "<br>"
won't be added as a new array entry. I also don't want to split the subheadline and markup.
The reason why i want to do this, is that i add sequence after sequence to a p-tag and when the height gets bigger than the surrounding element i remove the last added sequence and create a new floating p-tag. When i splitted it into sentences i saw, that the breakup was not good enough to ensure a good reading flow.
An example what i try to achieve can you see here
If you need any further information i will be glad to give it to you.
Thanks in advance,
Tobias
EDIT
The string could contain more html tags in the future. Is there a way to not touch anything between these tags?
EDIT 2
I created a jsfiddle: /
EDIT 3
Would it be a good idea to remove all html tags with encapsulated text and replace it with placeholders? Then split the string into words and add the untouched html-tags when the placeholder is reached? What would be the regex to extract all html tags?
At the moment i am working on text that is broken into floating columns to display it in a magazine-like
way.
I asked in a previous question how to split
the text into sentences and it works like a charm:
sentences = text.replace(/\.\s+/g,'.|').replace(/\?\s/g,'?|').replace(/\!\s/g,'!|').split("|");
Now i want to go a step further and split it into words. But i do also have some elements in it, that should not be splitted. Like subheadlines.
An example text would be:
A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.
My desired result would look like the following:
Array [
"A",
"wonderful",
"serenity",
"has",
"taken",
"possession",
"of",
"my",
"entire",
"soul.",
"<strong>This is a subheadline</strong>",
"<br>",
"<br>",
"I",
"am",
"alone,",
"and",
"feel",
"the",
"charm",
"of",
"existence",
"in",
"this",
"spot."
]
When i split at all whitespaces i do get the words, but the "<br>"
won't be added as a new array entry. I also don't want to split the subheadline and markup.
The reason why i want to do this, is that i add sequence after sequence to a p-tag and when the height gets bigger than the surrounding element i remove the last added sequence and create a new floating p-tag. When i splitted it into sentences i saw, that the breakup was not good enough to ensure a good reading flow.
An example what i try to achieve can you see here
If you need any further information i will be glad to give it to you.
Thanks in advance,
Tobias
EDIT
The string could contain more html tags in the future. Is there a way to not touch anything between these tags?
EDIT 2
I created a jsfiddle: http://jsfiddle/m9r9q/1/
EDIT 3
Would it be a good idea to remove all html tags with encapsulated text and replace it with placeholders? Then split the string into words and add the untouched html-tags when the placeholder is reached? What would be the regex to extract all html tags?
Share Improve this question edited May 23, 2017 at 12:34 CommunityBot 11 silver badge asked Sep 20, 2013 at 23:20 Tobias GolbsTobias Golbs 4,6164 gold badges30 silver badges50 bronze badges 8- Can you put together a jsfiddle of the situation? – Jake Commented Sep 20, 2013 at 23:25
- @Jake: Did you saw my example? And if not does that help you to understand what i want to achieve? But nevertheless i will create a jsfiddle :) – Tobias Golbs Commented Sep 20, 2013 at 23:27
- 1 Did see the example, it's just that we can't modify that code :) – Jake Commented Sep 20, 2013 at 23:27
- I might be missing something here but why not use CSS, caniuse./#search=column admittedly IE is the main non-conforming browser. – user2417483 Commented Sep 20, 2013 at 23:30
- @Jeff: Please consider for this example css columns is not an option. The application needs to be as backwards patible as possible! – Tobias Golbs Commented Sep 20, 2013 at 23:34
2 Answers
Reset to default 3Although i want to try to extract the html parts and add them afterwards untouched
Forget about it and about my previous post. I just got an idea that it's much better to use built in browser engine to operate on html code.
You can just use this:
var text = 'A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.';
var elem = document.createElement('div');
elem.innerHTML = text;
var array = [];
for(var i = 0, childs = elem.childNodes; i < childs.length; i ++) {
if (childs[i].nodeType === 3 /* document.TEXT_NODE */) {
array = array.concat(childs[i].nodeValue.trim().split(/\s+/));
} else {
array.push(childs[i].outerHTML);
}
}
It DOES support nested tags this time, also it supports all possible syntax without hard-coded exceptions for non closable tags :)
As I stated before in ment - you shouldn't do this. But if you insist - here's a possible answer:
var text = 'A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.';
var array = [],
tagOpened = false,
stringBuilder = [];
text.replace(/(<([^\s>]*)[^>]*>|\b[^\s<]*)\s*/g, function(all, word, tag) {
if (tag) {
var closing = tag[0] == '/';
if (closing) {
stringBuilder.push(all);
word = stringBuilder.join('');
stringBuilder = [];
tagOpened = false;
} else {
tagOpened = tag.toLowerCase() != 'br';
}
}
if (tagOpened) {
stringBuilder.push(all);
} else {
array.push(word);
}
return '';
});
if (stringBuilder.length) array.push(stringBuilder.join(''));
It doesn't support nested tags. You can add this functionality by implementing a stack for your opened tags