counting surnames in a range a-d, e-h, i-l.... with MySQL_问答_开发者

counting surnames in a range a-d, e-h, i-l.... with MySQL

开发者 https://www.devze.com 2023-04-11 23:50 出处：网络

I have a table of surnames and I want to count the number of surnames in each alphabetic range of A-D or E-H etc.

相关专题：

I have a table of surnames and I want to count the number of surnames in each alphabetic range of A-D or E-H etc.

I came up with the following query, which works, and I woul开发者_JAVA技巧d like to hear people's opinions on it and perhaps ways to do it better.

select count(*) FROM people 
group by surname REGEXP '^[a-d].*', 
         surname REGEXP '^[e-h].*', 
         surname REGEXP '^[i-l].*', 
         surname REGEXP '^[m-p].*', 
         surname REGEXP '^[q-t].*', 
         surname REGEXP '^[u-z].*';

Here's the best way to achieve this (using regex anyway):

select
    sum(surname REGEXP '^[a-dA-D].*') as ad_count,
    sum(surname REGEXP '^[e-hE-H].*') as eh_count,
    sum(surname REGEXP '^[i-lI-L].*') as il_count,
    sum(surname REGEXP '^[m-pM-P].*') as mp_count,
    sum(surname REGEXP '^[q-tQ-T].*') as qd_count,
    sum(surname REGEXP '^[u-zU-Z].*') as uz_count
from people

This elegant brevity works due to the fact that in mysql, true is 1 and false is 0, thus sum(some condition) is the count of how many time it's true.

btw, I added uppercase to your regex.

You would get better performance by selecting from an inner select that does the work of calculating the group more efficiently (eg by using a case on substr(surname, 1, 1)), then summing on tests against that calculated vale value.

Regex is overkill and completely not needed here.

Perhaps something like this, using basic string algebra:

SELECT
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'a' AND 'd' THEN 1 ELSE 0 END) AS `SUM_a-d`,
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'e' AND 'h' THEN 1 ELSE 0 END) AS `SUM_e-h`,
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'i' AND 'l' THEN 1 ELSE 0 END) AS `SUM_i-l`,
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'm' AND 'p' THEN 1 ELSE 0 END) AS `SUM_m-p`,
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'q' AND 't' THEN 1 ELSE 0 END) AS `SUM_q-t`,
   SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'u' AND 'z' THEN 1 ELSE 0 END) AS `SUM_u-z`
FROM `people`

You can make the query a bit more explicit like so:

SELECT 
  SUM(CASE WHEN  surname REGEXP '^[a-d].*' THEN 1 ELSE 0 END) AS a_d_count
  ,SUM(CASE WHEN surname REGEXP '^[e-h].*' THEN 1 ELSE 0 END) AS e_h_count
  ,SUM(CASE WHEN surname REGEXP '^[i-l].*' THEN 1 ELSE 0 END) AS i_l_count
  ,SUM(CASE WHEN surname REGEXP '^[m-p].*' THEN 1 ELSE 0 END) AS m_p_count
  ,SUM(CASE WHEN surname REGEXP '^[q-t].*' THEN 1 ELSE 0 END) AS q_t_count
  ,SUM(CASE WHEN surname REGEXP '^[u-z].*' THEN 1 ELSE 0 END) AS u_z_count
FROM (SELECT surname FROM people ORDER BY surname ASC) p

Avoiding regexes and conditionals, you might do this:

SELECT CONCAT(LEFT(UPPER(surname),1), '-', CHAR(ASCII(UPPER(surname))+3)) AS r, 
  count(id) 
FROM people
GROUP BY ROUND((ASCII(UPPER(surname)-65)/4),0);

This sets your ranges to 4 letters long, which does mean that the last range is'yz', but you can fiddle that around with a bit more math.