I need to determine the length of string which may contain html-entities.
For example "&a开发者_JAVA百科mp;darr ;" (↓) would return length 6, which is correct, but I want these entities to be counted as only 1 character.
<div id="foo">↓</div>
alert(document.getElementById("foo").innerHTML.length); // alerts 1
So based on that rationale, create a div, append your mixed up entity ridden string to it, extract the HTML and check the length.
var div = document.createElement("div");
div.innerHTML = "↓↓↓↓";
alert(div.innerHTML.length); // alerts 4
Try it here.
You might want to put that in a function for convenience, e.g.:
function realLength(str) { // maybe there's a better name?
var el = document.createElement("div");
el.innerHTML = str;
return el.innerHTML.length;
}
Since there's no solution using jQuery yet:
var str = 'lol&';
alert($('<span />').html(str).text().length); // alerts 4
Uses the same approach like karim79, but it never adds the created element to the document.
You could for most purposes assume that an ampersand followed by letters, or a possible '#' and numbers, followed by a semicolon, is one character.
var strlen=string.replace(/&#?[a-zA-Z0-9]+;/g,' ').length;
If you are running the javascript in a browser I would suggest using it to help you. You can create an element and set its innerHTML to be your string containing HTML-entities. Then extract the contents of that element you just created as text.
Here is an example (uses Mootools): http://jsfiddle.net/mqchen/H73EV/
Unfortunately, JavaScript does not natively support encoding or decoding of HTML entities, which is what you will need to do to get the 'real' string length. I was able to find this third-party library which is able to decode and encode HTML entities and it appears to work well enough, but there's no guaranteeing how complete it will be.
http://www.strictly-software.com/htmlencode
Using ES6 (introduces codePointAt()
:
function strlen (str) {
let sl = str.length
let chars = sl
for (i = 0; i < sl; i++) if (str.codePointAt(i) > 65535) {
chars--;
i++;
}
return chars
}
Beware charCodeAt()
does not work the same way.
精彩评论