This problem actually hit me recently.
So I was tasked with putting people's bios up on the web (asked for opinions in a different question), which I went with XML and just created elements based on what sections was going to be displayed.
Some people had formulas in their bio and when I was copying/pasting the formatting didn't copy over.
My question is that is there an easy way to parse out the formulas and format accordingly?
One idea I had was to just subscript the numbers, but I would have to implement bbcode tags to do this as there are numbers all over the place. Hmm, or I could detect if a number is to the right of a letter and subscript the n开发者_Python百科umber.Some of the forumlas are like CoO3
I used PHP to parse the XML.
What are your opinions?
Maybe something like this?
<?php
function formatFormulas($html)
{
$regex = '/(\\s*(Ac|Ag|Al|Am|Ar|As|At|Au|Ba|Be|Bh|Bi|Bk|Br|B|Ca|Cd|Ce|Cf|Cl|Cm|Co|Cr|Cs|Cu|C|';
$regex .= 'Db|Ds|Dy|Er|Es|Eu|Fe|Fm|Fr|F|Ga|Gd|Ge|He|Hf|Hg|Ho|Hs|H|In|Ir|I|Kr|K|La|Li|Lr|Lu|Md|';
$regex .= 'Mg|Mn|Mo|Mt|Na|Nb|Nd|Ne|Ni|No|Np|N|Os|O|Pa|Pb|Pd|Pm|Po|Pr|Pt|Pu|P|Ra|Rb|Re|Rf|Rg|Rh|';
$regex .= 'Rn|Ru|Sb|Sc|Se|Sg|Si|Sm|Sn|Sr|S|Ta|Tb|Tc|Te|Th|Ti|Tl|Tm|Uub|Uuh|Uuo|Uup|Uuq|Uus|Uut|';
$regex .= 'U|V|W|Xe|Yb|Y|Zn|Zr)\\s*(<[^>]+>)*\\s*\\d*\\s*(<[^>]+>)*\\s*)+/';
if ( preg_match_all($regex, $html, $m) ) {
for ($i = 0; $i < count($m[0]); $i++) {
$replace = preg_replace('/\\s+/', "", $m[0][$i]);
$replace = preg_replace('/<[^>]+>/', "", $replace);
$replace = preg_replace('/\\d+/', '<sub>$0</sub>', $replace);
$leading = preg_replace('/^(\\s*)[\\S\\s]*/', '$1', $m[0][$i]);
$trailing = preg_replace('/^[\\S\\s]*?(\\s*)$/', '$1', $m[0][$i]);
$replace = $leading . $replace . $trailing;
$html = str_replace($m[0][$i], $replace, $html);
}
}
return $html;
}
?>
I would lean toward using REGEX to parse your chem notation
Maybe this helps? http://www.pmichaud.com/pipermail/pmwiki-users/2008-October/052692.html
精彩评论