开发者

Massive DB and mysql

开发者 https://www.devze.com 2023-02-05 12:08 出处:网络
A new project we are working required a lot of data analysis but we are finding this to be VERY slow, we are looking for ways to change our approach with software and or hardware.

A new project we are working required a lot of data analysis but we are finding this to be VERY slow, we are looking for ways to change our approach with software and or hardware.

We are currently running on a amazon ec2 instance (linux):

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge


processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
stepping        : 5
cpu MHz         : 2133.408
cache size      : 4096 KB

MemTotal:      7347752 kB
MemFree:        728860 kB
Buffers:         40196 kB
Cached:      开发者_运维问答  2833572 kB
SwapCached:          0 kB
Active:        5693656 kB
Inactive:       456904 kB
SwapTotal:           0 kB
SwapFree:            0 kB

One part of the db is articles and entities and a link table for example:

mysql> DESCRIBE articles_entities;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | char(36)     | NO   | PRI | NULL    |       | 
| article_id | char(36)     | NO   | MUL | NULL    |       | 
| entity_id  | char(36)     | NO   | MUL | NULL    |       | 
| created    | datetime     | YES  |     | NULL    |       | 
| modified   | datetime     | YES  |     | NULL    |       | 
| relevance  | decimal(5,4) | YES  | MUL | NULL    |       | 
| analysers  | text         | YES  |     | NULL    |       | 
| anchor     | varchar(255) | NO   |     | NULL    |       | 
+------------+--------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

As you can see from the table below we have a lot of assoications growing at a rate of 100,000+ a day

mysql> SELECT count(*) FROM articles_entities;
+----------+
| count(*) |
+----------+
|  2829138 | 
+----------+
1 row in set (0.00 sec)

A simple query like the one below is taking too much time (12 secs)

mysql> SELECT count(*) FROM articles_entities WHERE relevance <= .4 AND relevance > 0;
+----------+
| count(*) |
+----------+
|   357190 | 
+----------+
1 row in set (11.95 sec)

What should we be considering to improve our lookup times? Different DB storage? Different hardware.


As mrorigo asked, please provide the SHOW CREATE TABLE articles_entities so we can see the actual indexes of your table.

As a note from MySQL documentation http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. 
For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use an index if the columns do not form a leftmost prefix of the index

So if relevance is part of a multi-column index, but isn't the leftmost column of that index, then the index is not used for your query.

This is a common issue that is often overlooked.


Using char(36) for keys is not the fastest you can do with MySQL. Use INT-types for keys if possible. If you index CHAR columns, the indexes will be VERY large compared to an (BIG)INT index (if not 'properly' created)

However, if your column values are not numeric, you are stuck with CHAR columns (which ARE still faster than VARCHAR, but can create large indexes).

Please provide a SHOW CREATE TABLE of tables to see key/index parameters, and also as the previous answer said, an EXPLAIN for the queries in question could help provide a better answer.

PS. Use SHOW TABLE STATUS LIKE '{table_name}' to see index (and data) sizes of the table.


There are three things that matter when it comes to query performance:

Indexes. Memory. Everything else.

The first thing to do is check your indexes. Do an EXPLAIN on your queries to find out how MySQL is processing them.

If that looks sensible, the next thing would be to check memory. How big is your total database? Memory is cheap these days, and queries that run from memory will be much, much faster than queries that have to read from disk.

After you've explored those, if performance is still slow, then it might be time to consider other options.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号