I have a text document that will be loaded into a string and will need to search it's content for matching keywords in a Keyword table in MySQL.
Would it be better to load the keywords from MySQL into a PHP array (using the keywords as the key) and then doing searches against that array by iterating through the ngrams of 开发者_运维技巧the text? OR would it be better to iterate through the ngrams of the string and then searching each against the MySQL DB (this would lead to many DB queries)?
Depends on how big your index is vs how many words you are checking, etc. e.g., Is it worth it to load 1 GB of MySQL index into PHP memory to iterate over 10 words? No.
This shouldn't be hard to implement both ways. Benchmark and find out. (Make sure your database is properly indexed.)
Not sure how many keywords are you going to have but in either case there always are overhead involved in
connecting to the database
sending queries through network
receiving results through network
Not sure how PHP works in connecting to the DB but Java uses "reflection" which is not one of the fastest technology known.
Even if you do indexing in the database you are not going to get results in constant time complexity. But if you use a data structure like hashmap then each iteration will take constant time. Which means if your document has n words and you iterate through each one of them and check if it exists in the keyword hashmap or not then the time complexity of the program will be just O(n).
But again like everyone else said you have to run your own benchmarks and it all depends on the size of keywords table and document you are analyzing
精彩评论