Performance trade-offs for loading DB table into array vs searching within the DB table_问答_开发者

Performance trade-offs for loading DB table into array vs searching within the DB table

开发者 https://www.devze.com 2023-03-10 20:58 出处：网络

I have a text document that will be loaded into a string and will need to search it\'s content for matching keywords in a Keyword table in MySQL.

I have a text document that will be loaded into a string and will need to search it's content for matching keywords in a Keyword table in MySQL.

Would it be better to load the keywords from MySQL into a PHP array (using the keywords as the key) and then doing searches against that array by iterating through the ngrams of 开发者_运维技巧the text? OR would it be better to iterate through the ngrams of the string and then searching each against the MySQL DB (this would lead to many DB queries)?

Depends on how big your index is vs how many words you are checking, etc. e.g., Is it worth it to load 1 GB of MySQL index into PHP memory to iterate over 10 words? No.

This shouldn't be hard to implement both ways. Benchmark and find out. (Make sure your database is properly indexed.)

Not sure how many keywords are you going to have but in either case there always are overhead involved in

connecting to the database

sending queries through network

receiving results through network

Not sure how PHP works in connecting to the DB but Java uses "reflection" which is not one of the fastest technology known.

Even if you do indexing in the database you are not going to get results in constant time complexity. But if you use a data structure like hashmap then each iteration will take constant time. Which means if your document has n words and you iterate through each one of them and check if it exists in the keyword hashmap or not then the time complexity of the program will be just O(n).

But again like everyone else said you have to run your own benchmarks and it all depends on the size of keywords table and document you are analyzing