开发者

Easily filterable database table with large amounts of entries

开发者 https://www.devze.com 2023-03-12 06:59 出处:网络
I am developing a issue logger for my project and am running into an issue when analyzing the logged data. The problem being that this table grows very fast and that the filters used to search on data

I am developing a issue logger for my project and am running into an issue when analyzing the logged data. The problem being that this table grows very fast and that the filters used to search on data in the table can vary in almost every way, seeing as we're not always interested in the same fields. So indexes aren't really an option.

The table is currently on a MySQL database, with the following structure:

CREATE TABLE `log_issues` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `id_user` int(11) DEFAULT NULL,
  `type` varchar(50) NOT NULL,
  `title` varchar(100) NOT NULL DEFAULT '',
  `message` mediumtext NOT NULL,
  `debug` mediumtext,
  `duration` float DEFAULT NULL,
  `date` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `date` (`date`,`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now my question is, how can I run queries on this table when it has millions of entries without having to wait forever for a result? For example just filt开发者_StackOverflowering on the id of a user takes forever. I know I can place an index on the id_user portion, but I might want to combine it with other fields, or due to the way the query is generated by the tool that views these logs it might not utilize the indexes properly.

I think I might be better off using MongoDB or a different NoSQL database, but I don't have any experience with them. Do document based database have an easier time filtering a large dataset without indexes or will I always be stuck with this problem no matter the database?

To summarize:

I have a table with a large amount of data, indexes can't be used (least not if they need to be ordered) and I need to get results without waiting for over 10 seconds. What technologies can I use?

Any suggestions would be much appreciated.


First you should decide whether you want to remain in RDBMSes land or not. Nowadays it doesn't make much sense for most scenarios, especially ones with complex data structure or a requirement to scale a lot.

You may want to check RavenDB. You can get a prototype working with it in hours, including the initial learning of concepts there.

Indexes are required anywhere, definitely in any NoSQL too. The real question is how easy it is to create and maintain them. With RavenDB you get hands-free indexing; indexes are created automatically as you go, based on the type of queries you make. It is recommended to pre-define them to reduce staleness, but yet they are the same indexes also when they're created automatically.

I see in another answer you tackled the issue with Mongo; well, with Raven you don't HAVE to define indexes yet they will be created for you.


First, what is "forever"? How long are we talking here?

Second start indexing. I know that you can search on any field, but what's wrong with 8 indexes?

If you don't have an index it's going to do a table scan to find the information, and that will be slow.

Also, if you are consistently searching on one field, you might consider doing a Clustered Index on that field.

EDIT

Another option, keep your log table as is. Then create some jobs to run (hourly?) that will organize your data. For example, we have an EventLog table. We only ever insert into that table. We then have EventsByDate, EventsByHour, EventsByAccountId etc as separate tables. These are then indexed and we hit these to look at the data.

This allows you to not define indexes, and have the inserts be as fast as possible, while at the same time being able to search data at a reasonable rate.


You might consider partitioning your table. Some kind of date based partition makes sense in many cases. Otherwise you could partition by type if you're never going to query over multiple types or if you can manage the types separately. The key with partitioning is to never span the partitions in a query. Tables that go on "forever" really need to be partitioned or cleaned up at some point. Very few applications can scale indefinitely.

0

精彩评论

暂无评论...
验证码 换一张
取 消