Is there a way to parse these strings?_问答_开发者

If there is, I certainly don't see it. We are doing magnetic stripe reads off of driver's licenses. The data does not seem to be consistent. The standard that the driver's licenses should follow sets limits on the length that any one field can have. The part that I can't wrap my head around is how to parse this data.

For example, a field may allow 13 total characters but only 8 are used. In this case, there will always be a caret delimiter ending that portion of the string. However, and here is the tricky part, if a field is exactly 13 (of the 13 allowable), there is no end caret delimiter and no right padding. All of the data just runs together.

Here are two sample strings.

%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?
%CALOS ANGELES^DOE$JOHN$CARL^14开发者_开发百科324 MAIN ST APT 5^?

Using PHP, how might I do this? I'd truly appreciate a hand on this. I'm really stumped.

Okay, here we go. I used the x flag to make the regex more readable and be able to comment it.

From the spec @EboMike posted, each field has a maximum length and is terminated by ^ if it is shorter than that length. The name is a composite field using $ as a separator between family name, first name, middle name, and suffix. Same goes for the address, which uses $ if the address has multiple lines.

$licenses = array(
    '%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?',
    '%CALOS ANGELES^DOE$JOHN$CARL^14324 MAIN ST APT 5^?'
);

foreach ($licenses as $license) {
    preg_match(
        '@
            ^%
            (.{2})          # State, 2 chars
            ([^^]{0,12}.)   # City, 13 chars, delimited by ^
            ([^^]{0,34}.)   # Name, 35 chars, delimited by ^
            ([^^]{0,28}.)   # Address, 29 chars, delimited by ^
            \?$
        @x',
        $license,
        $fields
    );

    $state   = $fields[1];
    $city    = rtrim($fields[2], '^');
    $name    = explode('$', rtrim($fields[3], '^'));
    $address = explode('$', rtrim($fields[4], '^'));

    echo "$license\n";
    echo "STATE:   "; print_r($state);   echo "\n";
    echo "CITY:    "; print_r($city);    echo "\n";
    echo "NAME:    "; print_r($name);
    echo "ADDRESS: "; print_r($address);
    echo "\n";
}

Output:

CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^
STATE:   CA
CITY:    MISSION HILLS
NAME:    Array
(
    [0] => SMITH
    [1] => JOHN
    [2] => JIM
    [3] => JR
)
ADDRESS: Array
(
    [0] => 1147 SOMESTREET
)

CALOS ANGELES^DOE$JOHN$CARL^14324 MAIN ST APT 5^
STATE:   CA
CITY:    LOS ANGELES
NAME:    Array
(
    [0] => DOE
    [1] => JOHN
    [2] => CARL
)
ADDRESS: Array
(
    [0] => 14324 MAIN ST APT 5
)

Didn't you ask this question a few hours ago? Someone posted a regex that handles the case where you separate strings that are either delimited or run exactly 13 characters here: Help with a delimited string

Did that not work?

EDIT: The format is explained here: http://en.wikipedia.org/wiki/Magnetic_stripe_card#United_States_driver.27s_licenses

For the city, it says "Field Separator - one character (generally '^') (absent if city reaches max length)". So again, a simple regex can do wonders here. Refer to the example, you can adjust it to match the format as detailed in the entry here.

EDIT: Okay, I'll give it a shot.

$str = "%CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?";
preg_match("/%(..)".
           "([^\^]{1,13})\^?".
           "([^\\\$]+)\\\$".
           "([^\\\$]+)\\\$/",
           $str, $m);
$State = $m[1];
$City = $m[2];
$LastName = $m[3];
$FirstName = $m[4];

Just as an example of hwo you could go at it. Basically, ([^\^]{1,13}) means it'll try to get up to 13 characters that are not the '^' character. Once that's done, it'll consume the '^' character itself IF it's there via \^?.

Work from left to right, dealing with one field at a time.

Strip off the leading %:

CAMISSION HILLSSMITH$JOHN$JIM$JR^1147 SOMESTREET^?

Take the first 15 chars (first field is max. 15 chars, right?):

CAMISSION HILLS

Doesn't contain a caret - great that's our first field - the next field starts on 16th char:

SMITH$JOHN$JIM$JR^1147 SOMESTREET^? (R1)

I don't know the max len. of this field - let's assume it's 20. Take the first 20 chars:

SMITH$JOHN$JIM$JR^11

Contains a caret - so we've > 1 field here. Take the chars up to the caret:

SMITH$JOHN$JIM$JR

...that's our next field. Now grab the string from (R1) above starting on the (length of prev field + 2)th character (+2 to skip over the ^)

1147 SOMESTREET^?

etc.

If this were java, I'd solve this with regular expressions. I know there must be some in PHP?

All the constraints you mentioned can be translated into REGEX.

for example:

X{n,m}?      X, at least n but not more than m times

can be used with something like:

[^%\$\^]{1,13}[%\$\^]

Which reads as, "1-13 instances of any character not equal to %, $, or ^ followed by one of those very same delimiters"

When I write regex, I often refer back to Java's great doc page. You can also do neat tricks like extract particular matching portions and pull out particular words. Again, I'm more familiar with java but PHP is too mature of a language not to have the same kinds of features.

I hope that helps in some way. If no one else answers, I can try to create the regex you need.