I've been trying to make a Regex to match the charset of mime multipart emails so as I can decode them correctly. However I've found that there are some differences in the format that I can't seem to work out a Regex for, as I'm no expert.
currently I'm using (?<=charset=).*(?=;)
however the examples I've found by sending emails from different clients are:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
charset=US-ASCII;
Content-Type: text/plain; charset=iso-8859-1
So my Regex works on first two but not the last, however if I remove (?=;)
then I will also match the format=flowed
part, which I don't wa开发者_StackOverflow中文版nt.
Instead of .*
, you can use [^;]*
. That is, match anything but the ;
.
So, the pattern becomes:
(?<=charset=)[^;]*
References
- regular-expressions.info/Character Classes
Building on this I've found this catches a couple more circumstances:
(?<=charset=)(([^;,\r\n]))*
Hope that helps.
Match on either ;
or the end of line ($
).
精彩评论