I'm trying to match and break up a typical tv torrent's title:
MyTV.Show.S09E01.HDTV.XviD
MyTV.Show.S10E02.HDTV.XviD MyTV.Show.901.HDTV.XviD MyTV.Show.1102.HDTV.XviD
I'm trying to break these strings up into 3 capture groups for each entry: Title, Season, Episode.
I can handle the first 2 easy enough:
^([a-zA-Z0-9.]*)\.S([0-9]{1,2})E([0-9]{1,2}).*$
However, the third and fourth one prove difficult to break apart the season and episode. If I could work backwards it would be easier. For example, with "901", If I could work backwards it would be take the first to digits as the episode number, anything remaini开发者_运维问答ng before that is the season number.
Does anyone have any tips for how I can break these strings up into those relevant capture groups?
Here's what I would use:
(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)
Has capture groups:
1: Name
2: Season
3: Episode
4: The Rest
Here's some code in C# (courtesy of this post): see it live
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
string s = @"MyTV.Show.S09E01.HDTV.XviD
MyTV.Show.S10E02.HDTV.XviD
MyTV.Show.901.HDTV.XviD
MyTV.Show.1102.HDTV.XviD";
Extract(s);
}
private static readonly Regex rx = new Regex
(@"(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)", RegexOptions.IgnoreCase);
static void Extract(string text)
{
MatchCollection matches = rx.Matches(text);
foreach (Match match in matches)
{
Console.WriteLine("Name: {0}, Season: {1}, Ep: {2}, Stuff: {3}\n",
match.Groups[1].ToString().Trim(), match.Groups[2],
match.Groups[3], match.Groups[4].ToString().Trim());
}
}
}
Produces:
Name: MyTV.Show, Season: 09, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 10, Ep: 02, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 9, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 11, Ep: 02, Stuff: HDTV.XviD
Almost every media file I've ever seen that has come from a torrent had two-digit episodes. With that, you should be able to do E([0-9]{2}).
instead and get the expression to match.
I'd estimate 99.9% of shows are marked with two digit episodes. If you're trying to write a script to easily label your own shows, I'd go with the two digit episode assumption and manually rename mistagged files you come across. If you're trying to write something for public consumption, you probably have a lot more syntaxes that you'll need to consider. I've seen this tried by other applications in the past, and all have worked just so-so. It's a hard problem that probably has no single solution.
精彩评论