开发者

easy way to determine if a string CAN'T be a valid regex

开发者 https://www.devze.com 2023-01-17 04:29 出处:网络
I have a config file that the user can specify sections, and then within those section they can specify regular expressions. I have to parse this config file and separate the regex\'s into the various

I have a config file that the user can specify sections, and then within those section they can specify regular expressions. I have to parse this config file and separate the regex's into the various sections.

Is there an easy way to delimitate a regex from a section header? I was thinking just the standard

[section]
regex1
regex2

But I just realized that [section] is a valid regex. So I'm wondering if there's a way I can format a section header so that it can ONLY be under开发者_JAVA百科stood as a section header and not a regex.


There's an unlimited ways of making an invalid regexp, but the first thing that comes to mind would be

*section*

You can't have a quantifier (*) at the start of the regexp.

(The other * is there just to satisfy my obsession for symmetry.)


I don't know your problem domain, so I don't know what forms of regex you're expecting, but it seems to me you should keep your section formatting as it is. A regex that starts with [ and ends with ] and has no square brackets in between is quite unusual. It can only match a single character. So leave the section headers as they are. Strictly speaking, they are valid regexes, but they probably aren't interesting regexes.

Also, why not use ConfigParser from the standard library, and let it do the parsing for you?


There are easy ways, but they all require changing your format:

  1. Use indentation, similar to how Python source is interpreted. Leading spaces would need special handling, e.g. "(?: )abc" instead of " abc".
  2. Use an INI format, where each item in a section requires a name=value pair.
  3. Use some sort of list syntax. ast.literal_eval will be helpful.

    section1 = [
      "regex 1",
      "2",
      "3",
    ]
    section2 = ["..."]
    

Primarily, don't invent your own format, or make it as close to a known format as you can. The third is a subset of Python syntax, for example, and you could even use raw string literals naturally.

JSON or YAML may be useful for you.


As others have said, please don't invent yet another config format. Use the Python Standard Library's ConfigParser, which will be able to parse the [section] notation exactly as you have shown it.

EDIT: The allow_no_value option allows you to to just have a single entry, rather than a key/value pair. And the default dict type is OrderedDict, so it will maintain order.

0

精彩评论

暂无评论...
验证码 换一张
取 消