I have a feed that I need to parse into 开发者_运维技巧an android application. The data needs to be stored in a data base, and Im currently having problems with the performance.
I need to categories the items into Categories and Sub Categories.
I do have ID's for the Categories, so that is cool. But for the sub categories I don't. This causes a lot of string comparison to make sure that no duplicates are added to the db.
Would it be good practice to generate a ID from the name of the sub category? Or is this just as painful to compute?
EDIT:
Category A (ID 1)
Sub Category C (no ID)
Sub Category Z (no ID)
Sub Category V (no ID)
Category B (ID 7)
Sub Category O (no ID)
Sub Category C (no ID) (this is not the same Sub Category 'C' as under Category 'A')
The data looks something like this above. I store Category in one table and subcategory in another, and I don't want to add duplicates. So in order to avoid duplicated records I need to check what already exists. But I don't have any IDs for the Sub Cateogries
There are multiple ways to solve this problem, and it really depends on how many inserts you're actually doing. If it isn't too many, then it would be sufficient to index the sub-cateogory column and then before each insert simply do:
select count(*) from sub_category_table where sub_category_field = 'subCategory'
If the query returns greater than 0, you can skip the insert.
If that's not good enough performance-wise, then it would help to have more information on your data and schema.
精彩评论