开发者

Processing complex data filters

开发者 https://www.devze.com 2023-03-13 09:23 出处:网络
I\'m trying to figure out the best way to deal with the following situation. Let\'s say we have an entity type with attributes A, B and C. Each of these attributes can have a known set of possible va

I'm trying to figure out the best way to deal with the following situation.

Let's say we have an entity type with attributes A, B and C. Each of these attributes can have a known set of possible values. Entities have consumers that are only interested in entities with certain attribute values. This is defined using filters. For example (3 filters):

A   B   C

A1  B12 C4
A3  B2  - 
A11 -   C2

Ax, Bx and Cx are specific values, '-' means 'any value'. Filter values are AND'ed, multiple filters are OR'ed. The final combination would look like this:

(A=A1 AND B=B12 AND C=C4) OR (A=A3 AND B=B2) OR (A=A11 AND C=C2)

When an entity is saved into the database, I want to notify certain consumers about it. The consumers to be notified are those who have defined at least one filter that matches attribute values of the saved entity.

I'm expecting to have thousands of consumers, each of them with multiple filters defined. There will probably be ~30 attributes as well. The whole thing will use a relational database (MySQL).

The first thing that comes to mind is having some continuosly running process that periodically does entity searches for each consumer based on their defined filters and then notifies the consumer if new entities are found. This would work, but I'm not sure having thousands of somewhat complex queries running all the time is a g开发者_Python百科ood idea. Are there any less performance intensive alternatives, or is my concern about performance premature here?


Try using a binary key. If you take the example that you have above, if there are a limited number of options for A, B, and C then assign a value to each option and represent this as a binary value. Example there are options between 1 and 8 for A, B and C. In a binary string a can then be any value form 000 to 111. The same can be said of options B and C. Then by adding the numbers together you have one Key that can represent any option selected for options of A , B and C. Use either the first or last option to represent ALL (I.e no filter).

A sample key may then be 010 001 111 (010=A1; 001=B12; 111 = Any Option in C)

Converting the binary key to an integer value : E.g 010001111 = 143 as a decimal.

Then a SQL Query of SELECT * FROM Customers WHERE CustomerStoredFilterKEY IN (143, 155, 152)

Will return a result very quickly


This is very high-level because I've never done anything like this before, and it depends a lot on what the other processes are, but off the top of my head...

You could have a trigger on insert call some code to write a message to a message queue which specifies what filter(s) the new record matches. Your consumers then get messages somehow from the queue, and find out about the inserted records they are interested in consuming. This removes the problem of having a process constantly querying and polling the table, but adds overhead to inserting into the table.


If you are building this query dynamically then the performance should be fine, assuming reasonable indexes. After all, it will resolve to a single select statement for each query.

If it were me, I'd be more concerned about hardcoding each attribute as a table column. You might want to consider making the attributes a child table -- but then, that would have a negative impact on performance.

I'm less sure about this, but I wonder if MySQL would handle those ORs better if they were done as seperate select statements that then went through a union. Of course, you can test that with 'explain', etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消