Most used words in text with php_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-06 03:51 出处：网络

I found the code below on stackoverflow and it works well in finding the most common words in a string. But can I exclude the counting on common words like \"a, if, you, have, etc\"? Or would I have t

I found the code below on stackoverflow and it works well in finding the most common words in a string. But can I exclude the counting on common words like "a, if, you, have, etc"? Or would I have to remove the elements after counting? How would I do this开发者_开发百科? Thanks in advance.

<?php

$text = "A very nice to tot to text. Something nice to think about if you're into text.";


$words = str_word_count($text, 1); 

$frequency = array_count_values($words);

arsort($frequency);

echo '<pre>';
print_r($frequency);
echo '</pre>';
?>

This is a function that extract common words from a string. it takes three parameters; string, stop words array and keywords count. you have to get the stop_words from txt file using php function that take txt file into array

$stop_words = file('stop_words.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

$this->extract_common_words( $text, $stop_words)

You can use this file stop_words.txt as your primary stop words file, or create your own file.

function extract_common_words($string, $stop_words, $max_count = 5) {
      $string = preg_replace('/ss+/i', '', $string);
      $string = trim($string); // trim the string
      $string = preg_replace('/[^a-zA-Z -]/', '', $string); // only take alphabet characters, but keep the spaces and dashes too…
      $string = strtolower($string); // make it lowercase
    
      preg_match_all('/\b.*?\b/i', $string, $match_words);
      $match_words = $match_words[0];
       
      foreach ( $match_words as $key => $item ) {
          if ( $item == '' || in_array(strtolower($item), $stop_words) || strlen($item) <= 3 ) {
              unset($match_words[$key]);
          }
      }  
       
      $word_count = str_word_count( implode(" ", $match_words) , 1); 
      $frequency = array_count_values($word_count);
      arsort($frequency);
      
      //arsort($word_count_arr);
      $keywords = array_slice($frequency, 0, $max_count);
      return $keywords;
}

Here is my solution by using the built-in PHP functions:

most_frequent_words — Find most frequent word(s) appeared in a String

function most_frequent_words($string, $stop_words = [], $limit = 5) {
    $string = strtolower($string); // Make string lowercase

    $words = str_word_count($string, 1); // Returns an array containing all the words found inside the string
    $words = array_diff($words, $stop_words); // Remove black-list words from the array
    $words = array_count_values($words); // Count the number of occurrence

    arsort($words); // Sort based on count

    return array_slice($words, 0, $limit); // Limit the number of words and returns the word array
}

Returns array contains word(s) appeared most frequently in the string.

Parameters :

string $string - The input string.

array $stop_words (optional) - List of words which are filtered out from the array, Default empty array.

string $limit (optional) - Limit the number of words returned, Default 5.

There's not additional parameters or a native PHP function that you can pass words to exclude. As such, I would just use what you have and ignore a custom set of words returned by str_word_count.

You can do this easily by using array_diff():

$words = array("if", "you", "do", "this", 'I', 'do', 'that');
$stopwords = array("a", "you", "if");

print_r(array_diff($words, $stopwords));

gives

 Array
(
    [2] => do
    [3] => this
    [4] => I
    [5] => do
    [6] => that
)

But you have to take care of lower and upper case yourself. The easiest way here would be to convert the text to lowercase beforehand.

Most used words in text with php

Parameters :

精彩评论

关注公众号

热门标签

图文推荐

Most used words in text with php

Parameters :

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：