Parsing periodic elements in PHP/html_问答_开发者

开发者 https://www.devze.com 2022-12-14 04:08 出处：网络

This problem actually hit me recently. So I was tasked with putting people\'s bios up on the web (asked for opinions in a different question), which I went with XML and just created elements based on

相关专题：php

This problem actually hit me recently.

So I was tasked with putting people's bios up on the web (asked for opinions in a different question), which I went with XML and just created elements based on what sections was going to be displayed.

Some people had formulas in their bio and when I was copying/pasting the formatting didn't copy over.

My question is that is there an easy way to parse out the formulas and format accordingly?

One idea I had was to just subscript the numbers, but I would have to implement bbcode tags to do this as there are numbers all over the place. Hmm, or I could detect if a number is to the right of a letter and subscript the n开发者_Python百科umber.

Some of the forumlas are like CoO₃

I used PHP to parse the XML.

What are your opinions?

Maybe something like this?

<?php
function formatFormulas($html)
{
    $regex  = '/(\\s*(Ac|Ag|Al|Am|Ar|As|At|Au|Ba|Be|Bh|Bi|Bk|Br|B|Ca|Cd|Ce|Cf|Cl|Cm|Co|Cr|Cs|Cu|C|';
    $regex .= 'Db|Ds|Dy|Er|Es|Eu|Fe|Fm|Fr|F|Ga|Gd|Ge|He|Hf|Hg|Ho|Hs|H|In|Ir|I|Kr|K|La|Li|Lr|Lu|Md|';
    $regex .= 'Mg|Mn|Mo|Mt|Na|Nb|Nd|Ne|Ni|No|Np|N|Os|O|Pa|Pb|Pd|Pm|Po|Pr|Pt|Pu|P|Ra|Rb|Re|Rf|Rg|Rh|';
    $regex .= 'Rn|Ru|Sb|Sc|Se|Sg|Si|Sm|Sn|Sr|S|Ta|Tb|Tc|Te|Th|Ti|Tl|Tm|Uub|Uuh|Uuo|Uup|Uuq|Uus|Uut|';
    $regex .= 'U|V|W|Xe|Yb|Y|Zn|Zr)\\s*(<[^>]+>)*\\s*\\d*\\s*(<[^>]+>)*\\s*)+/';
    if ( preg_match_all($regex, $html, $m) ) {

        for ($i = 0; $i < count($m[0]); $i++) {

            $replace = preg_replace('/\\s+/', "", $m[0][$i]);
            $replace = preg_replace('/<[^>]+>/', "", $replace);
            $replace = preg_replace('/\\d+/', '<sub>$0</sub>', $replace);
            $leading = preg_replace('/^(\\s*)[\\S\\s]*/', '$1', $m[0][$i]);
            $trailing = preg_replace('/^[\\S\\s]*?(\\s*)$/', '$1', $m[0][$i]);
            $replace = $leading . $replace . $trailing;
            $html = str_replace($m[0][$i], $replace, $html);

        }

    }

    return $html;
}
?>

I would lean toward using REGEX to parse your chem notation

Maybe this helps? http://www.pmichaud.com/pipermail/pmwiki-users/2008-October/052692.html