开发者

Detect particular tokens in a string. C#

开发者 https://www.devze.com 2023-01-27 10:44 出处:网络
I have a very large string (HTML) and in this HTML there is particular tokens where all of them starts with \"#\" and ends with \"#\"

I have a very large string (HTML) and in this HTML there is particular tokens where all of them starts with "#" and ends with "#"

Simple Eg开发者_如何学JAVA

<html>
<body>
      <p>Hi #Name#, You should come and see this #PLACE# - From #SenderName#</p>
</body>
</html>

I need a code that will detect these tokens and will put it in a list. 0 - #Name# 1 - #Place# 2 - #SenderName#

I know that I can use Regex maybe, anyway have you got some ideas to do that?


You can try:

// using System.Text.RegularExpressions;
// pattern = any number of arbitrary characters between #.
var pattern = @"#(.*?)#";
var matches = Regex.Matches(htmlString, pattern);

foreach (Match m in matches) {
    Console.WriteLine(m.Groups[1]);
}

Answer inspired in this SO question.


Yes you can use regular expressions.

string test = "Hi #Name#, You should come and see this #PLACE# - From #SenderName#";
Regex reg = new Regex(@"#\w+#");
foreach (Match match in reg.Matches(test))
{
    Console.WriteLine(match.Value);
}

As you might have guessed \w denotes any alphanumeric character. The + denotes that it may appear 1 or more times. You can find more info here msdn doc (for .Net 4. You'll find other versions there as well).


A variant without Regex if you like:

var splitstring = myHtmlString.Split('#');
var tokens = new List<string>();
for( int i = 1; i < splitstring.Length; i+=2){
  tokens.Add(splitstring[i]);
}   


foreach (Match m in Regex.Matches(input, @"#\w+#"))
    Console.WriteLine("'{0}' found at index {1}.",  m.Value, m.Index);


try this

var result = html.Split('#')
                    .Select((s, i) => new {s, i})
                    .Where(p => p.i%2 == 1)
                    .Select(t => t.s);

Explanation:

line1 - we split the text by the character '#'

line2 - we select a new anonymous type, which includes the strings position in the array, and the string itself

line3 - we filter the list of anonymous objects to those that have an odd index value - effectively picking 'every other' string - this fits in with finding those strings that were wrapped in the hash character, rather than those outside

line4 = we strip away the indexer, and return just the string from the anonymous type


Use:

MatchCollection matches = Regex.Matches(mytext, @"#(\w+)#");

foreach(Match m in matches)
{
    Console.WriteLine(m.Groups[1].Value);
}


Naive solution:

var result = Regex
    .Matches(html, @"\#([^\#.]*)\#")
    .OfType<Match>()
    .Select(x => x.Groups[1].Value)
    .ToList();


Linq solution:

        string s = @"<p>Hi #Name#, 
          You should come and see this #PLACE# - From #SenderName#</p>";

        var result = s.Split('#').Where((x, y) => y % 2 != 0).Select(x => x);


Use the Regex.Matches method with a pattern of something like

#[^#]+# for the pattern.

Which is possibly the most naive way.

This might then need to be adjusted if you wish to avoid including the '#' characters in the output match, possibly with a lookaround:

(?<=#)[^#]+(?=#)

(A match value for this would be 'hello' not '#hello#' - so you don't have to do any more trimming)


This gives you a list of the tokens as requested:

var tokens = new List<string>();
var matches = new Regex("(#.*?#)").Matches(html);

foreach (Match m in matches) 
    tokens.Add(m.Groups[1].Value);

Edit: If you don't want the pound characters included, just move them outside the parentheses in the Regex string (see Pablo's answer).

0

精彩评论

暂无评论...
验证码 换一张
取 消