Using Prototype, I'm trying to extract a piece of text from the DOM - this would normal be a simple $().innerHTML
job, but the HTML is nested slightly.
<td class="time-record">
<script type="text/javascript">
//<![CDATA[
document.write('XXX ago'.gsub('XXX', i18n_time_ago_in_words(1229311439000)));
//]]>
</script>
about 11 months ago by <span class开发者_如何学JAVA="author"><strong>Justin</strong></span>
</td>
In this case, innerHTML
is going to pick up the JavaScript, which will cause all sort of problems.
What's the best/efficient/fastest way to extract about 11 months ago by <span class="author"><strong>Justin</strong></span>
without the JavaScript?
Use innerHTML
, and run it through stripScripts:
var html = $$('td.time-record')[0].innerHTML.stripScripts()
That would be useful for grabbing the html of the single cell. A more general solution that does the same but for all td.time-record
elements would be:
$$('td.time-record').pluck('innerHTML').invoke('stripScripts');
which would return to you an array of each cell's html (with <script>
elements removed) that you could then .join('')
or iterate over.
I don't use Prototype's stripScripts
or stripTags
, as they're trivial, naïve regex hacks that don't get anywhere near handling all possible markup constructs correctly. For a simple case like this you can probably get away with stripScripts, but using these functions for anything security-sensitive is a mistake.
Personally I'd simply remove the script element from the DOM before taking the innerHTML. Once an inline script has been executed there's no reason you need to keep the HTMLScriptElement in the document.
$$('.time-record script').invoke('remove');
精彩评论