开发者

Cassandra Design for Product ID management

开发者 https://www.devze.com 2023-03-01 11:30 出处:网络
I\'m new to Cassandra and wanted to start up trying a simple test Our Data model with traditional RDBMS is as follows

I'm new to Cassandra and wanted to start up trying a simple test

Our Data model with traditional RDBMS is as follows

Table Company (Id, Name)

Table Product (Id, Name, Company ID) where Company ID is FK reference to Company table

Table ProductInstance (Id, ProductID) where ProductID is FK reference to Product table

Table ProductInstanceRating (Id, ProductInstanceID, Comment) where ProductInstanceID is FK reference to ProductInstance table

Any suggestions on how the design should be with Cassandra ?

Update:

I tried to look from the querying perspective

Data to be captured

1) Product Reference is a composite key consisting of : Product Name, Product Lot Number, AND Customer ID Product Name : 12456 Product Lot Number : PQ23 Customer ID : 879456

Then unique Product reference will be something like 12456|PQ23|879456

2) Product Instance will be unique set of hash numbers for every instance of the aforementioned Product

AND each instance of the product will get a unique number 784A, 876T ,etc and this number will be unique for a particular Product Reference

Product Instance reference will be something like 12456|PQ23|879456|784A

3) Each Product Unique number can receive more than 1 rating

In that case, the queries will be something like

Query1) In order to add/insert rating for a Product Instance; Fetch Row for Product i.e 12456|PQ23|879456

Query2) Fetch the ProductInstance ID within this row (maybe name of column family)

Query3) Add 开发者_JAVA技巧the rating information as a column:value pair

Should the design be something like

12456|PQ23|879456 {
      784A{timestamp1:{rating:valueA
                      person name:valueX}
           timestamp2:{rating:valueB
                      person name:valueY}}

      876T{timestamp1:{rating:valueC
                      person name:valueX}
           timestamp2:{rating:valueB
                      person name:valueY}}
}

Thereafter, we would want to:

Query4) fetch all products that have instances

Query5) fetch all product instance that have ratings

Query6) fetch highest rating for product

Query7) fetch average rating for product

Is there a better and more efficient way to implement this ?


In Cassandra, you need to think about what queries you wish to run (unlike a RDBMS where you can make a fairly abstract model of the domain and then devise SQL queries afterwards) because you can only do row-key lookups or use fairly limited secondary indexes. Any other indexes need to be explicitly created within your column families.

For this reason it is common to denormalise. However, maintaining the data may then be more difficult, as there are no cascading deletes etc in Cassandra.

One candidate structure might be to have a ProductInstance columnfamily to store the ratings, then a Product columnfamily as an index to find the instances of a given product, then a Company columnfamily as an index to find the products for a particular company. Or you could just denormalise and put the Company as a column in the Product columnfamily.

To repeat, the 'correct' structure for you can only be determined when you know what queries you wish to support.

0

精彩评论

暂无评论...
验证码 换一张
取 消