CamelCase conversion to friendly name, i.e. Enum constants; Problems?_问答_开发者

In my answer to this question, I mentioned that we used UpperCamelCase parsing to get a description of an enum constant not decor开发者_如何学编程ated with a Description attribute, but it was naive, and it didn't work in all cases. I revisited it, and this is what I came up with:

var result = Regex.Replace(camelCasedString, 
                            @"(?<a>(?<!^)[A-Z][a-z])", @" ${a}");
result = Regex.Replace(result,
                            @"(?<a>[a-z])(?<b>[A-Z0-9])", @"${a} ${b}");

The first Replace looks for an uppercase letter, followed by a lowercase letter, EXCEPT where the uppercase letter is the start of the string (to avoid having to go back and trim), and adds a preceding space. It handles your basic UpperCamelCase identifiers, and leading all-upper acronyms like FDICInsured.

The second Replace looks for a lowercase letter followed by an uppercase letter or a number, and inserts a space between the two. This is to handle special but common cases of middle or trailing acronyms, or numbers in an identifier (except leading numbers, which are usually prohibited in C-style languages anyway).

Running some basic unit tests, the combination of these two correctly separated all of the following identifiers: NoDescription, HasLotsOfWords, AAANoDescription, ThisHasTheAcronymABCInTheMiddle, MyTrailingAcronymID, TheNumber3, IDo3Things, IAmAValueWithSingleLetterWords, and Basic (which didn't have any spaces added).

So, I'm posting this first to share it with others who may find it useful, and second to ask two questions:

Anyone see a case that would follow common CamelCase-ish conventions, that WOULDN'T be correctly separated into a friendly string this way? I know it won't separate adjacent acronyms (FDICFCUAInsured), recapitalize "properly" camelCased acronyms like FdicInsured, or capitalize the first letter of a lowerCamelCased identifier (but that one's easy to add - result = Regex.Replace(result, "^[a-z]", m=>m.ToString().ToUpper());). Anything else?
Can anyone see a way to make this one statement, or more elegant? I was looking to combine the Replace calls, but as they do two different things to their matches it can't be done with these two strings. They could be combined into a method chain with a RegexReplace extension method on String, but can anyone think of better?

So while I agree with Hans Passant here, I have to say that I had to try my hand at making it one regex as an armchair regex user.

(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))

Is what I came up with. It seems to pass all the tests you put forward in the question.

var result = Regex.Replace(camelCasedString, @"(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))", @" ${a}");

Does it in one pass.

not that this directly answers the question, but why not test by taking the standard C# API and converting each class into a friendly name? It'd take some manual verification, but it'd give you a good list of standard names to test.

Let's say every case you come across works with this (you're asking us for examples that won't and then giving us some, so you don't even have a question left).

This still binds UI to programmatic identifiers in a way that will make both programming and UI changes brittle.

It still assumes your program will only be used in one language. Either your potential market it so small that just indexing an array of names would be scalable enough (e.g. a one-client bespoke or in-house project), or you are assuming you will never be successful enough to need to be available to other languages or other dialects of your first-chosen language.

Does "well, it'll work as long as we're a failure" sound like a passing grade in balancing designs?

Either code it to use resources, or else code it to pass the enum name blindly or use an array of names, as that at least will be modifiable afterwards.