For one of my clients I have to import a CSV of Medicare plans provided by the government (part one provided here) into Drupal 7. There are about 500,000 rows of data in that CSV, most of which differ only by the FIPS County code field - basically, every county that a plan is available in counts as one row.
Should I import all 500k rows into Drupal 7 as individual nodes, or create a single node for every plan and put the numerous FIPS codes associated with that plan in a multi-开发者_运维百科value text field? I opted for the latter route to begin with, however when I looked in the plan database it looks like some plans are available in more than 10,000 counties. I'd like to find the most efficient, Drupal-esque solution to storing all these plans and where they are available.
Generally it is very useful to avoid storing any duplicate data, so you are right, create 500k rows as individual nodes is a bad idea. I would rather create two content types (using CCK):
- Medicare Plan
- FIPS County code (or maybe just County)
And then create a many-to-many relationship between them (using CCK Node Reference, maybe Corresponding node references for mutual relationships if needed).
You can then create a view that will list all FIPS County codes attached to a particular Medicare Plan.
I ended up going with a row per plan - as it turned out, there were subtle differences between them that I missed. Thanks to all who answered!
精彩评论