How can I break this string for licence plates v/s states?
I have unique VIN N开发者_开发百科umbers in my database for every vehicle.
VIN varchar2(100)
Make varchar2(100)
Model varchar2(100)
Year varchar2(4)
PlateInfo varchar2(1000)
The objective is to take the string in the PlateInfo field and split it to states and the license plates. There are vehicles who have had more than 24 owner / plate changes.
The string in plate info typically looks like this.
MA~CT~DE~NJ~NJ~~~~~~VEG-1825~AX7547~117824~NEG-1012~BEG-1011~~
This needs to split into two columns StateId, PlateId
MA:VEG-1825
CT:AX7547
DE:117824
NJ:NEG-1012
NJ:BEG-1011
I was able to do this making an assumption that the number of occurrences for the tilde "~" character will always be an even number.
However when I ran this against the database, I found there are several vehicles where the information looks like this.
CT~DC~DE~MA~MD~NY~RI~VA~WA~WV~
My client wants me to put this in a state column with a null column for the plate. How can I achieve this? Would it be fair to make the assumption that each 2 character is a state, then validate it against the 50 states?
Assuming that your input string is a list of 10+ items where each item is terminated by a ~
, e.g.
"0~1~2~3~4~5~6~7~8~9~"
"0~1~2~3~4~5~6~7~8~9~10~11~12~13~14~15~16~17~18~19~"
you can remove the last ~
and split the string by ~
:
var parts = input.TrimEnd('~')
.Split('~');
The states seem to be the first 10 non-empty elements (see @Reed Copsey's answer):
var states = parts.Take(10)
.Where(s => s != "");
The plates seem to be the second 10 elements, which need to be padded with nulls if necessary:
var plates = parts.Skip(10)
.Take(10)
.Concat(Enumerable.Repeat<string>(null, 10));
Then zip the states and the plates as follows:
foreach (var item in states.Zip(plates, (state, plate) => new { state, plate }))
{
Console.WriteLine("{0,-10} {1}", item.state, item.plate);
}
Example 1:
MA VEG-1825
CT AX7547
DE 117824
NJ NEG-1012
NJ BEG-1011
Example 2:
CT <null>
DC <null>
DE <null>
MA <null>
MD <null>
NY <null>
RI <null>
VA <null>
WA <null>
WV <null>
Without knowing the specifications of how things are stored, it's difficult (or impossible) to provide accurate guidance.
That being said, it looks like you have 10 state entries, then up to 10 plates per record. If this is the case, you should be able to just use string.Split, not remove empty entries, and treat the first 10 items as states, and any remaining as matching plates.
Assumption:
There are no legal plates that consist of only two characters. If vanity plates like that are a possibility (I am not in the USA), you will have to take appropriate measures.
Pseudocode:
- Split the string at the
~
delimiter - Remove empty entries from the results -- you now have a lot of tokens which are either states or plate numbers
- All the 2-length strings are states
- All the rest are plates
- Combine them
Example code:
public static void Main()
{
var inputs = new[]
{
"MA~CT~DE~NJ~NJ~~~~~~VEG-1825~AX7547~117824~NEG-1012~BEG-1011~~",
"CT~DC~DE~MA~MD~NY~RI~VA~WA~WV~"
};
foreach(var input in inputs)
{
var plates = GetPlates(input);
Console.Out.WriteLine("Input string: " + input);
foreach(var plate in plates)
{
Console.Out.WriteLine(string.Format("{0} : {1}", plate.Key, plate.Value));
}
}
}
static KeyValuePair<string, string>[] GetPlates(string input)
{
var tokens = input.Split(new[] { '~' }, StringSplitOptions.RemoveEmptyEntries);
var states = tokens.Where(t => t.Length == 2).ToArray();
var plates = tokens.Where(t => t.Length != 2)
.Select(s => s.Replace("-", string.Empty));
return states.Zip(plates.Concat(Enumerable.Repeat<string>(null, states.Length)),
(state, plate) => new KeyValuePair<string, string>(state, plate)).ToArray();
}
Output:
Input string: MA~CT~DE~NJ~NJ~~~~~~VEG-1825~AX7547~117824~NEG-1012~BEG-1011~~
MA : VEG1825
CT : AX7547
DE : 117824
NJ : NEG1012
NJ : BEG1011
Input string: CT~DC~DE~MA~MD~NY~RI~VA~WA~WV~
CT :
DC :
DE :
MA :
MD :
NY :
RI :
VA :
WA :
WV :
精彩评论