开发者

Is there a method in C# to check if a string is a valid identifier [duplicate]

开发者 https://www.devze.com 2022-12-14 04:20 出处:网络
This question already has answers here: How to determine if a string is a valid variable name? (5 answers)
This question already has answers here: How to determine if a string is a valid variable name? (5 answers) Closed 7 years ago.

In Java, there are methods called isJavaIdentifierStart and isJavaIdentifierPart on t开发者_开发问答he Character class that may be used to tell if a string is a valid Java identifier, like so:

public boolean isJavaIdentifier(String s) {
  int n = s.length();
  if (n==0) return false;
  if (!Character.isJavaIdentifierStart(s.charAt(0)))
      return false;
  for (int i = 1; i < n; i++)
      if (!Character.isJavaIdentifierPart(s.charAt(i)))
          return false;
  return true;
}

Is there something like this for C#?


Yes:

// using System.CodeDom.Compiler;
CodeDomProvider provider = CodeDomProvider.CreateProvider("C#");
if (provider.IsValidIdentifier (YOUR_VARIABLE_NAME)) {
      // Valid
} else {
      // Not valid
}

From here: How to determine if a string is a valid variable name?


I would be wary of the other solutions offered here. Calling CodeDomProvider.CreateProvider requires finding and parsing the Machine.Config file, as well as your app.config file. That's likely to be several times slower than the time required to just check the string your self.

Instead I would advocate you make one of the following changes:

  1. Cache the provider in a static variable.

    This will cause you to take the hit of creating it only once, but it will slow down type loading.

  2. Create the provider directly, by creating a Microsoft.CSharp.CSharpCodeProvider instance your self

    This will skip the config file parsing all together.

  3. Write the code to implement the check your self.

    If you do this, you get the greatest control over how it's implemented, which can help you optimize performance if you need to. See section 2.2.4 of the C# language spec for the complete lexical grammar for C# identifiers.


With Roslyn being open source, code analysis tools are right at your fingertips, and they're written for performance. (Right now they're in pre-release).

However, I can't speak to the performance cost of loading the assembly.

Install the tools using nuget:

Install-Package Microsoft.CodeAnalysis -Pre

Ask your question:

var isValid = Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier("I'mNotValid");
Console.WriteLine(isValid);     // False


Basically something like:

const string start = @"(\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl})";
const string extend = @"(\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})";
Regex ident = new Regex(string.Format("{0}({0}|{1})*", start, extend));
s = s.Normalize();
return ident.IsMatch(s);


Necromancing here.

In .NET Core/DNX, you can do it with Roslyn-SyntaxFacts

Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword(
        Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind("protected")
);



foreach (ColumnDefinition cl in tableColumns)
{
    sb.Append(@"         public ");
    sb.Append(cl.DOTNET_TYPE);
    sb.Append(" ");

    // for keywords
    //if (!Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsValidIdentifier(cl.COLUMN_NAME))
    if (Microsoft.CodeAnalysis.CSharp.SyntaxFacts.IsReservedKeyword(
        Microsoft.CodeAnalysis.CSharp.SyntaxFacts.GetKeywordKind(cl.COLUMN_NAME)
        ))
        sb.Append("@");

    sb.Append(cl.COLUMN_NAME);
    sb.Append("; // ");
    sb.AppendLine(cl.SQL_TYPE);
} // Next cl 


Or in the old variant with Codedom - After a look in the mono sourcecode:

CodeDomProvider.cs

public virtual bool IsValidIdentifier (string value) 
286         { 
287             ICodeGenerator cg = CreateGenerator (); 
288             if (cg == null) 
289                 throw GetNotImplemented (); 
290             return cg.IsValidIdentifier (value); 
291         } 
292  

Then CSharpCodeProvider.cs

public override ICodeGenerator CreateGenerator() 
91      { 
92 #if NET_2_0 
93          if (providerOptions != null && providerOptions.Count > 0) 
94              return new Mono.CSharp.CSharpCodeGenerator (providerOptions); 
95 #endif 
96          return new Mono.CSharp.CSharpCodeGenerator(); 
97      } 

Then CSharpCodeGenerator.cs

protected override bool IsValidIdentifier (string identifier)
{
    if (identifier == null || identifier.Length == 0)
        return false;

    if (keywordsTable == null)
        FillKeywordTable ();

    if (keywordsTable.Contains (identifier))
        return false;

    if (!is_identifier_start_character (identifier [0]))
        return false;

    for (int i = 1; i < identifier.Length; i ++)
        if (! is_identifier_part_character (identifier [i]))
            return false;

    return true;
}



private static System.Collections.Hashtable keywordsTable;
private static string[] keywords = new string[] {
    "abstract","event","new","struct","as","explicit","null","switch","base","extern",
    "this","false","operator","throw","break","finally","out","true",
    "fixed","override","try","case","params","typeof","catch","for",
    "private","foreach","protected","checked","goto","public",
    "unchecked","class","if","readonly","unsafe","const","implicit","ref",
    "continue","in","return","using","virtual","default",
    "interface","sealed","volatile","delegate","internal","do","is",
    "sizeof","while","lock","stackalloc","else","static","enum",
    "namespace",
    "object","bool","byte","float","uint","char","ulong","ushort",
    "decimal","int","sbyte","short","double","long","string","void",
    "partial", "yield", "where"
};


static void FillKeywordTable ()
{
    lock (keywords) {
        if (keywordsTable == null) {
            keywordsTable = new Hashtable ();
            foreach (string keyword in keywords) {
                keywordsTable.Add (keyword, keyword);
            }
        }
    }
}



static bool is_identifier_start_character (char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || Char.IsLetter (c);
}

static bool is_identifier_part_character (char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || Char.IsLetter (c);
}

You get this code:

public static bool IsValidIdentifier (string identifier)
{
    if (identifier == null || identifier.Length == 0)
        return false;

    if (keywordsTable == null)
        FillKeywordTable();

    if (keywordsTable.Contains(identifier))
        return false;

    if (!is_identifier_start_character(identifier[0]))
        return false;

    for (int i = 1; i < identifier.Length; i++)
        if (!is_identifier_part_character(identifier[i]))
            return false;

    return true;
}


internal static bool is_identifier_start_character(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || c == '@' || char.IsLetter(c);
}

internal static bool is_identifier_part_character(char c)
{
    return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c >= '0' && c <= '9') || char.IsLetter(c);
}


private static System.Collections.Hashtable keywordsTable;
private static string[] keywords = new string[] {
    "abstract","event","new","struct","as","explicit","null","switch","base","extern",
    "this","false","operator","throw","break","finally","out","true",
    "fixed","override","try","case","params","typeof","catch","for",
    "private","foreach","protected","checked","goto","public",
    "unchecked","class","if","readonly","unsafe","const","implicit","ref",
    "continue","in","return","using","virtual","default",
    "interface","sealed","volatile","delegate","internal","do","is",
    "sizeof","while","lock","stackalloc","else","static","enum",
    "namespace",
    "object","bool","byte","float","uint","char","ulong","ushort",
    "decimal","int","sbyte","short","double","long","string","void",
    "partial", "yield", "where"
};

internal static void FillKeywordTable()
{
    lock (keywords)
    {
        if (keywordsTable == null)
        {
            keywordsTable = new System.Collections.Hashtable();
            foreach (string keyword in keywords)
            {
                keywordsTable.Add(keyword, keyword);
            }
        }
    }
}


Recently, I wrote an extension method that validates a string as a valid C# identifier.

You can find a gist with the implementation here: https://gist.github.com/FabienDehopre/5245476

It's based on the MSDN documentation of Identifier (http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx)

public static bool IsValidIdentifier(this string identifier)
{
    if (String.IsNullOrEmpty(identifier)) return false;

    // C# keywords: http://msdn.microsoft.com/en-us/library/x53a06bb(v=vs.71).aspx
    var keywords = new[]
                       {
                           "abstract",  "event",      "new",        "struct",
                           "as",        "explicit",   "null",       "switch",
                           "base",      "extern",     "object",     "this",
                           "bool",      "false",      "operator",   "throw",
                           "breal",     "finally",    "out",        "true",
                           "byte",      "fixed",      "override",   "try",
                           "case",      "float",      "params",     "typeof",
                           "catch",     "for",        "private",    "uint",
                           "char",      "foreach",    "protected",  "ulong",
                           "checked",   "goto",       "public",     "unchekeced",
                           "class",     "if",         "readonly",   "unsafe",
                           "const",     "implicit",   "ref",        "ushort",
                           "continue",  "in",         "return",     "using",
                           "decimal",   "int",        "sbyte",      "virtual",
                           "default",   "interface",  "sealed",     "volatile",
                           "delegate",  "internal",   "short",      "void",
                           "do",        "is",         "sizeof",     "while",
                           "double",    "lock",       "stackalloc",
                           "else",      "long",       "static",
                           "enum",      "namespace",  "string"
                       };

    // definition of a valid C# identifier: http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx
    const string formattingCharacter = @"\p{Cf}";
    const string connectingCharacter = @"\p{Pc}";
    const string decimalDigitCharacter = @"\p{Nd}";
    const string combiningCharacter = @"\p{Mn}|\p{Mc}";
    const string letterCharacter = @"\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl}";
    const string identifierPartCharacter = letterCharacter + "|" +
                                           decimalDigitCharacter + "|" +
                                           connectingCharacter + "|" +
                                           combiningCharacter + "|" +
                                           formattingCharacter;
    const string identifierPartCharacters = "(" + identifierPartCharacter + ")+";
    const string identifierStartCharacter = "(" + letterCharacter + "|_)";
    const string identifierOrKeyword = identifierStartCharacter + "(" +
                                       identifierPartCharacters + ")*";
    var validIdentifierRegex = new Regex("^" + identifierOrKeyword + "$", RegexOptions.Compiled);
    var normalizedIdentifier = identifier.Normalize();

    // 1. check that the identifier match the validIdentifer regex and it's not a C# keyword
    if (validIdentifierRegex.IsMatch(normalizedIdentifier) && !keywords.Contains(normalizedIdentifier))
    {
        return true;
    }

    // 2. check if the identifier starts with @
    if (normalizedIdentifier.StartsWith("@") && validIdentifierRegex.IsMatch(normalizedIdentifier.Substring(1)))
    {
        return true;
    }

    // 3. it's not a valid identifier
    return false;
}


The now-released Roslyn project provides Microsoft.CodeAnalysis.CSharp.SyntaxFacts, with SyntaxFacts.IsIdentifierStartCharacter(char) and SyntaxFacts.IsIdentifierPartCharacter(char) methods just like Java.

Here it is in use, in a simple function I use to turn noun phrases (eg "Start Date") into C# identifiers (eg "StartDate"). N.B I'm using Humanizer to do the camel-case conversion, and Roslyn to check whether a character is valid.

    public static string Identifier(string name)
    {
        Check.IsNotNullOrWhitespace(name, nameof(name));

        // trim off leading and trailing whitespace
        name = name.Trim();

        // should deal with spaces => camel casing;
        name = name.Dehumanize();

        var sb = new StringBuilder();
        if (!SyntaxFacts.IsIdentifierStartCharacter(name[0]))
        {
            // the first characters 
            sb.Append("_");
        }

        foreach(var ch in name)
        {
            if (SyntaxFacts.IsIdentifierPartCharacter(ch))
            {
                sb.Append(ch);
            }
        }

        var result = sb.ToString();

        if (SyntaxFacts.GetKeywordKind(result) != SyntaxKind.None)
        {
            result = @"@" + result;
        }

        return result;
    }

Tests;

    [TestCase("Start Date", "StartDate")]
    [TestCase("Bad*chars", "BadChars")]
    [TestCase("   leading ws", "LeadingWs")]
    [TestCase("trailing ws   ", "TrailingWs")]
    [TestCase("class", "Class")]
    [TestCase("int", "Int")]
    [Test]
    public void CSharp_GeneratesDecentIdentifiers(string input, string expected)
    {
        Assert.AreEqual(expected, CSharp.Identifier(input));
    }


This can be done using reflection - see How to determine if a string is a valid variable name?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号