I want to retrieve all hashtags from a tweet using a PHP function.
I know someone asked a similar question here, but there is no hint how exactly to implement this in PHP. Since I'm not very familiar with regular expressions, don't know how to write a function that returns an array of all hashtags in a tweet.
So how do I do this, using the following regular expressio开发者_如何学Cn:
#\S*\w
I created my own solution. It does:
- Finds all hashtags in a string
- Removes duplicate ones
- Sorts hashtags regarding to count of the existence in text
Supports unicode characters
function getHashtags($string) { $hashtags= FALSE; preg_match_all("/(#\w+)/u", $string, $matches); if ($matches) { $hashtagsArray = array_count_values($matches[0]); $hashtags = array_keys($hashtagsArray); } return $hashtags; }
Output is like this:
(
[0] => #_ƒOllOw_
[1] => #FF
[2] => #neslitükendi
[3] => #F_0_L_L_O_W_
[4] => #takipedeğerdost
[5] => #GönüldenTakipleşiyorum
)
$tweet = "this has a #hashtag a #badhash-tag and a #goodhash_tag";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
*Dashes are illegal chars for hashtags, underscores are allowed.
Don't forget about hashtags that contain unicode, numeric values and underscores:
$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #gøypålandet!";
preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);
print_r( $matches );
\p{Pc} - to match underscore
\p{N} - numeric character in any script
\p{L} - letter from any language
\p{Mn} - any non marking space (accents, umlauts, etc)
Try this regular expression:
/#[^\s]*/i
Or use this if there are multiple hash tags joined together (eg. #foo#bar).
/#[^\s#]*/i
Running it PHP would look like:
preg_match_all('/#[^\s#]*/i', $tweet_string, $result);
The result is an array containing all the hashtags in the Tweet (saved as "$result" - the third argument).
Lastly, check out this site. I've found it really handy for testing regular expressions. http://regex.larsolavtorvik.com/
EDIT: I tried your regular expression and it worked great too!
EDIT 2: Added another regex to extract hash tags, even if they're consecutive.
Use the preg_match_all()
function:
function get_hashtags($tweet)
{
$matches = array();
preg_match_all('/#\S*\w/i', $tweet, $matches);
return $matches[0];
}
精彩评论