开发者

How do you check whether every word in one string is found in another string?

开发者 https://www.devze.com 2023-01-20 13:39 出处:网络
Let\'s say I have a book title and I search for it in a database. The database produces matches, some of which are full matches and some of which are partial matches.

Let's say I have a book title and I search for it in a database. The database produces matches, some of which are full matches and some of which are partial matches.

A full match is when every word in the search result is represented by a word in the search terms. (i.e. there does not have to be a complete overlap on both sides)

I am only concerned with finding the full matches.

So if I type a search for "Ernest Hemingway - The Old Man and the Sea" and the results return the following:

Charles Nordhoff - Men Against The Sea
Rodman Philbrick - The Young Man and the Sea
Ernest Hemingway - The Old Man and the Sea
Ernest Hemingway - The Sun Also Rises
Ernest Hemingway - A Farewell to Arms
Ernest Hemingway - For Whom the Bell Tolls
Ernest Hemingway - A Moveable Feast
Ernest Hemingway - True at First Light
Men Against The Sea
The Old Man and the Sea
The Old Man and the Sea Dog

There are TWO full matches in this list: (according to the definition above)

Ernest Hemingway - The Old Man and the Sea 
The Old Man and the Sea 

To do this in Java, assume I have two variables:

Stri开发者_运维技巧ng searchTerms;
List<String> searchResults;

searchTerms in the example above represents what I typed in: Ernest Hemingway - The Old Man and the Sea

searchResults represents the list of strings I got back from the database above.

for (String result : searchResults) {
  // How to check for a full match? 
  // (each word in `result` is found in `searchTerms` 
}

My question is: in this for-loop, how do I check whether every word in the result String has a corresponding word in the searchTerms String?


To find the full match as you have defined it, you want to test that a set of tokens contains a particular subset. You can do this easily using a Set which you get for free in the collections libraries. One way to do this would be (the expense of regexes aside):

   Set<String> searchTerms = new HashSet<String>();
   Set<String> resultTokens = new HashSet<String>();

   searchTerms.addAll( Arrays.asList( searchString.split( "\\s+" ) );

   for ( String result : searchResults )
   {
      resultTokens.clear();
      resultTokens.addAll( Arrays.asList( result.split( "\\s+" ) ) );
      if ( resultTokens.containsAll( searchTerms ) )
      {
         // Perform match code
      }
   }

Alternatively, if you wanted to be stricter about it, you could test for set equality using resultTokens.equals( searchTerms ). In your example, this would narrow the result set to "Ernest Hemingway - The Old Man and the Sea"


Assuming your database result is accurate,

Split up result into tokens (words) using String.split(String delimiter) and see whether each token is found in searchTerms (using searchTerms.indexOf(String word) == -1).

for (String result : searchResults) {
    for(String word : result) {
        if(searchTerms.indexOf(word) == -1) {
            // result is not a full match
        }
    }

    //If none of the if statements executed, statement is a full match.
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号