开发者

Email string parsing using regex

开发者 https://www.devze.com 2023-03-11 11:04 出处:网络
I am trying to do a complicated (to me) regex on a multi-line snip from an e-mail. I have tried hard, with no luck. I am trying to get rid of anything from \"On \" through \" wrote:\"

I am trying to do a complicated (to me) regex on a multi-line snip from an e-mail. I have tried hard, with no luck. I am trying to get rid of anything from "On " through " wrote:"

Would be nice if you can also check to see if it contains the word "AcmeCompany", so it doesn't check for everything "On " "wrote:"

So far, I have this: /On(.*)AcmeCompany(.*)/im but it does not work...

say hello, world!

On Tue, Jun 7, 2011 at 6:18 AM, AcmeCompany <
24a95f49f7ce573fds2d+c@AcmeCompany.com> wrote:

Thank you for th开发者_JS百科e responses, but it seems like there's another problem.

EDIT: I found out that this works: /On[\s\S]+?AcmeCompany[\s\S]+?wrote:/m, but it seems to fail when the e-mail contents have word "On".

say hello, world!

On a plane!    

On Tue, Jun 7, 2011 at 6:18 AM, AcmeCompany <
24a95f49f7ce573fds2d+c@AcmeCompany.com> wrote:

EDIT2: Every mail client is different... gmail tends to do it in 2 lines, mail app from iphone do it in 1 line, so it doens't always follow the strict format.

1 thing for sure: beginning always uses "On " and ends with " wrote:". It also contains a hash and AcmeCompany, which I can also use to verify.


For the new requirement I am adding another reply. Hope you won't mind.

Can you try something like this?

/On\s(Mon|Tue|Wed|Thu|Fri|Sat)[\s\S]+?AcmeCompany[\s\S]+?wrote:/

I am trying again..how about using ?

/On.+?AcmeCompany[\s\S]+?wrote:/


Hope this helps:

/On[\s\S]+?AcmeCompany[\s\S]+?wrote:/

The regular expression above first matches On and then either of all spaces and non-spaces (together swallowing all characters and newlines) with a lazy repetition mode till it finds AcmeCompany. Again it matches all spaces and non-spaces (together swallowing all characters and newlines) with a lazy repetition till it finds wrote:


This will work:

On.*AcmeCompany.*

Maybe offtopic but... If you want to learn regex you should try Expresso

Example of Expresso at work:

Email string parsing using regex


To get the string before On Tue,Jun...:

$str = explode ('On', $yourstring);
$oldstr = array_pop($str); //Remove the last value of the $str array
echo trim( implode('On',$str) ); //Trim the string to remove any unnecessary line breaks

To find if the hidden message contains AcmeCompany:

if( strstr ( $oldstr , 'AcmeCompany' ) ) {
    echo "I found AcmeCompany!";
} else {
    echo "I didn't find AcmeCompany!";
}

Hope my answer is useful, even though I didn't use regex.


Try this: /On.*AcmeCompany <$[^:]+:/im, the m is important as it lets the $ match line breaks.

0

精彩评论

暂无评论...
验证码 换一张
取 消