开发者

Maybe I need a Regex?

开发者 https://www.devze.com 2023-03-06 08:21 出处:网络
I am making a simple console application for a home project. Basically, it monitors a folder for any files being added.

I am making a simple console application for a home project. Basically, it monitors a folder for any files being added.

FileSystemWatcher fsw = new FileSystemWatcher(@"c:\temp");
fsw.Created += new FileSystemEventHandler(fsw_Created);

bool monitor = true;

while (monitor)
{
    fsw.WaitForChanged(WatcherChangeTypes.Created, 1000);
    if(Console.KeyAvailable)
    {
        monitor = false;
    }
}

Show("User has quit the process...", ConsoleColor.Yellow);

When a new files arrives, 'WaitForChanges' gets called, and I can then start the work.

What I need to do is check the filename for patterns. In real life, I am putting video files into this folder. Based on the filename, I will have rules, which move the files into specific directories. So for now, I'll have a list of KeyValue pairs... holding a RegEx (I think?), and a folder. So, if the filename matches a regex, it moves it into the related folder.

An example of a filename is:

CSI- NY.S07E01.The 34th Floor.avi

So, my Regex needs to look at it, and see if the words CSI "AND" (NY "OR" NewYork "OR" New开发者_运维知识库 York) exist. If they do, I will then move them to a \Series\CSI\NY\ folder.

I need the AND, because another file example for a different series is:

CSI- Crime Scene Investigation.S11E16.Turn On, Tune In, Drop Dead

So, for this one, I would need to have some NOTs. So, I need to check if the filename has CSI, but NOT ("New York" or "NY" or "NewYork")

Could someone assist me with these RegExs? Or maybe, there's a better method?


You can try to store conditions in Func<string,bool>

Dictionary<Func<string,bool>,string> dic = new Dictionary<Func<string, bool>, string>();
Func<string, bool> f1 = x => x.Contains("CSI") && ((x.Contains("NY") || x.Contains("New York"));

dic.Add(f1,"C://CSI/");

foreach (var pair in dic)
{
    if(pair.Key.Invoke("CSI- NY.S07E01.The 34th Floor.avi"))
    {
        // copy
        return;
    }
}


I think you have the right idea. The nice thing about this approach is that you can add/remove/edit regular expressions to a config file or some other approach which means you don't have to recompile the project every time you want to keep track of a new show.

A regular expression for CSI AND NY would look something like this. First if you want to check if CSI exists in the filename the regex is simply "CSI". Keep in mind it's case sensitive by default. If you want to check if NY, New York or NewYork exist in the file name the regex is "((NY)|(New York)|(NewYork))" The bars indicate OR and the parenthesis are used to designate groups. In order to combine the two you could run both regexes and in some cases (where perhaps order is unimportant) this might be easier. However if you always expect the show type to come after the syntax would be "(CSI).*((NY)|(New York)|(NewYork))" The period means "any character" and the asterisk means zero or more.


This does not look as one regex, even if you succeed with tossing the whole thing into one. Regexes which match "anything without a given word" are a pain. I'd better stick with two regexes for each rule: one which should match, and the other which should NOT match for this rule to be triggered. If you need your "CSI" and "NY" but don't like fixing any particular order within the filename, you as well may switch from a pair of regexes to a pair of lists of regexes. In general it's better to put this logic into code and configuration and keep regexes as simple as possible. And yes, you're quite likely to get away with simple substring search, no explicit need for regexes as long as you keep your code smart enough.


Well, people already gave you some advises about doing this using:

  • Regular expressions
  • Func and storing exactly the C# code that will be executed against the file

so I'm just give you a different one.

I disagree with using Regular Expressions for this purpose. I agree with @Anton S. Kraievoy: I don't like regexes to match anything without a given word. It is easier to check: !text.Contains(word)

The second option looks perfect if you are looking for a fast solution, but...

If that is a more complex application, and you want to design it correctly, I think you should:

  • Define how you will store those patterns (in a class with members, or in a string, etc). An string example could be:
    • "CSI" & ("NY" || "Las Vegas")
  • Then write a module that will match a filename with that pattern.

You're creating kind of a DSL.

Why is it better than just paste directly the C# code?

Well, because:

  • You can easily change the semantic of your patterns
  • You can generate the validation code in any language you want, because you're storing patterns in a generic way.

The thing is how to match a pattern against a filename.

You have some options:

  • Write the grammar of your pattern and write a parser by yourself
  • Generate (I'm not 100% sure if it is possible, that depends on the grammar) the write a regex that will convert your grammar into C# code.
    Like: "A" & "B" to string.Contains("A") && string.Contains("B") or something like that.
  • Use a tool to do that, like ANTLR.
0

精彩评论

暂无评论...
验证码 换一张
取 消