I'm new to Cassandra and wanted to start up trying a simple test
Our Data model with traditional RDBMS is as follows
Table Company (Id, Name)
Table Product (Id, Name, Company ID) where Company ID is FK reference to Company table
Table ProductInstance (Id, ProductID) where ProductID is FK reference to Product table
Table ProductInstanceRating (Id, ProductInstanceID, Comment) where ProductInstanceID is FK reference to ProductInstance table
Any suggestions on how the design should be with Cassandra ?
Update:
I tried to look from the querying perspective
Data to be captured
1) Product Reference is a composite key consisting of : Product Name, Product Lot Number, AND Customer ID Product Name : 12456 Product Lot Number : PQ23 Customer ID : 879456
Then unique Product reference will be something like 12456|PQ23|879456
2) Product Instance will be unique set of hash numbers for every instance of the aforementioned Product
AND each instance of the product will get a unique number 784A, 876T ,etc and this number will be unique for a particular Product Reference
Product Instance reference will be something like 12456|PQ23|879456|784A
3) Each Product Unique number can receive more than 1 rating
In that case, the queries will be something like
Query1) In order to add/insert rating for a Product Instance; Fetch Row for Product i.e 12456|PQ23|879456
Query2) Fetch the ProductInstance ID within this row (maybe name of column family)
Query3) Add 开发者_JAVA技巧the rating information as a column:value pair
Should the design be something like
12456|PQ23|879456 {
784A{timestamp1:{rating:valueA
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
876T{timestamp1:{rating:valueC
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
}
Thereafter, we would want to:
Query4) fetch all products that have instances
Query5) fetch all product instance that have ratings Query6) fetch highest rating for product Query7) fetch average rating for productIs there a better and more efficient way to implement this ?
In Cassandra, you need to think about what queries you wish to run (unlike a RDBMS where you can make a fairly abstract model of the domain and then devise SQL queries afterwards) because you can only do row-key lookups or use fairly limited secondary indexes. Any other indexes need to be explicitly created within your column families.
For this reason it is common to denormalise. However, maintaining the data may then be more difficult, as there are no cascading deletes etc in Cassandra.
One candidate structure might be to have a ProductInstance columnfamily to store the ratings, then a Product columnfamily as an index to find the instances of a given product, then a Company columnfamily as an index to find the products for a particular company. Or you could just denormalise and put the Company as a column in the Product columnfamily.
To repeat, the 'correct' structure for you can only be determined when you know what queries you wish to support.
精彩评论