开发者

String Splitter in .NET

开发者 https://www.devze.com 2023-01-13 15:40 出处:网络
I have the below string P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO I need the output as P MV A1ZWR MAV#(X,,)

I have the below string

P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO

I need the output as

P

MV

A1ZWR

MAV#(X,,)

PV

MOV#(X,12,33)

LO

As can b开发者_如何学运维e make out that it can be easily done by splitting by "," but the problem comes

when it is MAV#(X,,) or MOV#(X,12,33) type.

Please help


You can use a regular expression to match the values between the separators, and specify that everything within the parantheses is part of the value. Example:

string data = "P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO";

foreach (Match m in Regex.Matches(data, @"\s*((\(.*?\)|[^,])*)(,|$)")) {
  Console.WriteLine(m.Groups[1].Value);
}

Output:

P
MV
A1ZWR
MAV#(X,,)
PV
MOV#(X,12,33)
LO


Since there was no such solution using just LINQ and I was interested how would it look I came up with this. But I wouldn't recommend using it in production code. Actually I hoped it would be nicer, but since nested parenthenses need to be handled I had to introduce mutable state variables.

string data = "P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO";

int depth = 0;
int group = 0;

var result = data
    .GroupBy(x => { 
        if (x == '(') depth++;
        if (x == ')') depth--;
        if (x == ',' && depth == 0) group++; 
        return group; })
    .Select(x => new String(x.ToArray()).Trim(' ', ','))


string input = "P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO";
IList<string> parts = new List<string>();
int paranthesisCount = 0;
int lastSplitIndex = 0;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '(')
    {
        paranthesisCount++;
        continue;
    }
    if (input[i] == ')')
    {
        paranthesisCount--;
        continue;
    }
    if (input[i] == ',' && paranthesisCount == 0)
    {
        parts.Add(input.Substring(lastSplitIndex, i - lastSplitIndex));
        lastSplitIndex = i + 1;
    }
}
if (input.Length - lastSplitIndex > 0)
{
    parts.Add(input.Substring(lastSplitIndex, input.Length - lastSplitIndex));
}


Best bet is to write a parser for the data. Look up a CSV parsing library, you could probably modify one to support #(...) instead of "..." without too much difficulty.


How about looping and detecting trademark characters such as ( and ):

string[] test = "P,MV,A1ZWR,MAV#(X,,), PV,MOV#(X,12,33),LO".Split(',');

bool insideElement = false;
string insideElementResult = "";
List<string> result = new List<string>();
foreach (string s in test)
{
    //Determine context:
    if (s.IndexOf("(") > -1)
        insideElement = true;

    //Determine where to add my nice string
    if (!insideElement)
        result.Add(s);
    else
        insideElementResult += s;

    //Determine if contact has ended:
    if (s.IndexOf(")") > -1)
    {
        insideElement = false;
        result.Add(insideElementResult);
        insideElementResult = null;
    }
    else if (insideElement)
    {
        insideElementResult += ",";
    }

}

results in:

    [0] "P" string
    [1] "MV"    string
    [2] "A1ZWR" string
    [3] "MAV#(X,,)" string
    [4] " PV"   string
    [5] "MOV#(X,12,33)" string
    [6] "LO"    string

Granted, not as fancy as regex, and will break on inner parenthesis, but hey, it works ;)


To understand the string also a Parser can be of help. The simplest parser is a recursive one. That way you can be sure that

  1. all the parenthesis are ok
  2. no wrong splits occour
  3. all tokens are correct (that might be of help, but depends on application)

A good parser that has error checking is like having a xsd for you specific language.

I have done a parser with ANTLR. Check it out if it helps you. It might be an overkill on the problem. Just think about it.


Use a divider character that will not be part of your entries.

 P~MV~A1ZWR~MAV#(X,,)~ PV~MOV#(X,12,33)~LO

Or even an invisible character (0x00?)


If this is a file you're populating, use a delimiter that won't be an issue, such as the | for example. If it's a file that you're scanning and parsing, you could probably use regular expressions to pull out the data you need.

If not, you might have to split the strings and look at the buckets while looking for issues and doing any necessary merging and further splitting.


This is function will pull out all the tokens, make sure there are no double commas between tokens, and make sure that all the parentheses are closed. It's a bit long.

IEnumerable<string> Tokenise(string input)
{
    const char tokenlimiter = ',';
    const char funcstart = '#';
    const char funcend = ')';
    StringBuilder token = new StringBuilder(5);
    bool gotfunc = false;
    bool gotone = false;
    int pos = 0;
    int opened = 0;
    foreach(char c in input)
    {
        if (c == funcstart)
        {
            gotfunc = true;
            opened++;
        }
        if(c == funcend)
        {
            gotfunc = false;
            opened--;
        }
        if(!gotfunc && c == tokenlimiter)
        {
            gotone = true;
            if(token.Length == 0)
            {
                throw new ArgumentException("Blank instruction at " + pos, input);
            }
            yield return token.ToString();
        }
        if(gotone)
        {
            token = new StringBuilder(5);
            gotone = false;
        }
        else
        {
            token.Append(c);    
        }
        if(pos == input.Length - 1)
        {
            if (!gotfunc && opened == 0 && c != tokenlimiter)
            {
                yield return token.ToString();
            }
            else if (gotfunc || opened != 0)
            {
                throw new ArgumentException("Broken function", input);
            }
            else
            {
                throw new ArgumentException("Blank instruction at " + pos, input);
            }
        }
        pos++;
    }

}


private static void CreateListString(string s)
{
string[] splits = s.Split(new char[] { ',' });
List<string> strs = new List<string>();
bool isLimiterSeen = false;
StringBuilder str = null;
for (int i = 0; i < splits.Length; i++)
{
if (splits[i].Contains("#("))
{
isLimiterSeen = true;
str = new StringBuilder();
}
if (!isLimiterSeen)
strs.Add(splits[i]);
else
{
str = str.Append("," + splits[i]);
if (splits[i].EndsWith(")"))
{
if (str.ToString().StartsWith(","))
strs.Add(str.ToString().Substring(1));
else
strs.Add(str.ToString());
isLimiterSeen = false;
str = null;
}
}
}
}
0

精彩评论

暂无评论...
验证码 换一张
取 消