how does a SQL query work?_问答_开发者_运维开发者技术经验分享

How does a SQL query work? Ho开发者_StackOverfloww does it get compiled? Is the from clause compiled first to see if the table exists? How does it actually retrieve data from the database? How and in what format are the tables stored in a database?

I am using phpmyadmin, is there any way I can peek into the files where data is stored? I am using MySQL

sql execution order:

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT .

SQL Query mainly works in three phases .

1) Row filtering - Phase 1: Row filtering - phase 1 are done by FROM, WHERE , GROUP BY , HAVING clause.

2) Column filtering: Columns are filtered by SELECT clause.

3) Row filtering - Phase 2: Row filtering - phase 2 are done by DISTINCT , ORDER BY , LIMIT clause.

In here i will explain with an example . Suppose we have a students table as follows:

id_	name_	marks	section_
1	Julia	88	A
2	Samantha	68	B
3	Maria	10	C
4	Scarlet	78	A
5	Ashley	63	B
6	Abir	95	D
7	Jane	81	A
8	Jahid	25	C
9	Sohel	90	D
10	Rahim	80	A
11	Karim	81	B
12	Abdullah	92	D

Now we run the following sql query:

select section_,sum(marks) from students where id_<10 GROUP BY section_ having sum(marks)>100 order by section_ LIMIT 2;

Output of the query is:

section_	sum
A	247
B	131

But how we got this output ?

I have explained the query step by step . Please read bellow:

1. FROM , WHERE clause execution

Hence from clause works first therefore from students where id_<10 query will eliminate rows which has id_ greater than or equal to 10 . So the following rows remains after executing from students where id_<10 .

id_	name_	marks	section_
1	Julia	88	A
2	Samantha	68	B
3	Maria	10	C
4	Scarlet	78	A
5	Ashley	63	B
6	Abir	95	D
7	Jane	81	A
8	Jahid	25	C
9	Sohel	90	D

2. GROUP BY clause execution

now GROUP BY clause will come , that's why after executing GROUP BY section_ rows will make group like bellow:

id_	name_	marks	section_
9	Sohel	90	D
6	Abir	95	D
1	Julia	88	A
4	Scarlet	78	A
7	Jane	81	A
2	Samantha	68	B
5	Ashley	63	B
3	Maria	10	C
8	Jahid	25	C

3. HAVING clause execution

having sum(marks)>100 will eliminates groups . sum(marks) of D group is 185 , sum(marks) of A groupd is 247 , sum(marks) of B group is 131 , sum(marks) of C group is 35 . So we can see tha C groups's sum is not greater than 100 . So group C will be eliminated . So the table looks like this:

id_	name_	marks	section_
9	Sohel	90	D
6	Abir	95	D
1	Julia	88	A
4	Scarlet	78	A
7	Jane	81	A
2	Samantha	68	B
5	Ashley	63	B

4. SELECT clause execution

select section_,sum(marks) query will only decides which columns to prints . It is decided to print section_ and sum(marks) column .

section_	sum
D	185
A	245
B	131

5. ORDER BY clause execution

order by section_ query will sort the rows ascending order.

section_	sum
A	245
B	131
D	185

6. LIMIT clause execution

LIMIT 2; will only prints first 2 rows.

section_	sum
A	245
B	131

This is how we got our final output .

Well...

First you have a syntax check, followed by the generation of an expression tree - at this stage you can also test whether elements exist and "line up" (i.e. fields do exist WITHIN the table). This is the first step - any error here any you just tell the submitter to get real.
Then you have.... analysis. A SQL query is different from a program in that it does not say HOW to do something, just WHAT THE RESULT IS. Set based logic. So you get a query analyzer in (depending on product bad to good - oracle long time has crappy ones, DB2 the most sensitive ones even measuring disc speed) to decide how best to approach this result. This is a really complicated beast - it may try dozens or hundreds of approaches to find one he believes to be fastest (cost based, basically some statistics).
Then that gets executed.

The query analyzer, by the way, is where you see huge differences. Not sure about MySQL - SQL Server (Microsoft) shines in that it does not have the best one (but one of the good ones), but that it really has nice visual tools to SHOW the query plan, compare the estimates the the analyzer to the real needs (if they differ too much table statistics may be off so the analyzer THINKS a large table is small). They present that nicely visually.

DB2 had a great optimizer for some time, measuring - i already said - disc speed to put it into it's estimates. Oracle went "left to right" (no real analysis) for a long time, and took user provided query hints (crap approach). I think MySQL was VERY primitive too in the start - not sure where it is now.

Table format in database etc. - that is really something you should not care for. This is documented (clearly, especially for an open source database), but why should you care? I have done SQL work for nearly 15 years or so and never had that need. And that includes doing quite high end work in some areas. Unless you try building a database file repair tool.... it makes no sense to bother.

The order of SQL statement clause execution-

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY

My answer is specific to Oracle database, which provides tutorials pertaining to your queries. Well, when SQL database engine processes any SQL query/statement, It first starts parsing and within parsing it performs three checks Syntax, Semantic and Shared Pool. To know how do these checks work? Follow the link below.

Once query parsing is done, it triggers the Execution plan. But hey Database Engine! you are smart enough. You do check if this SQL query has already been parsed (Soft Parse), if so then you directly jump on execution plan or else you deep dive and optimize the query (Hard Parse). While performing hard parse, you also use a software called Row Source Generation which provides Iterative Execution Plan received from optimizer. Enough! see the SQL query processing stages below.

how does a SQL query work?

Note - Before execution plan, it also performs Bind operations for variable's values and once the query is executed It performs Fetch to obtain the records and finally store into result set. So in short, the order is-

PASRE -> BIND -> EXECUTE -> FETCH

And for in depth details, this tutorial is waiting for you. This may be helpful to someone.

If you're using SSMS for Sql Server and want to know where your data files are stored, you can use this query

SELECT

    mdf.database_id, 

    mdf.name, 

    mdf.physical_name as data_file, 

    ldf.physical_name as log_file, 

    db_size = CAST((mdf.size * 8.0)/1024 AS DECIMAL(8,2)), 

    log_size = CAST((ldf.size * 8.0 / 1024) AS DECIMAL(8,2))

FROM (SELECT * FROM sys.master_files WHERE type_desc = 'ROWS' ) mdf

JOIN (SELECT * FROM sys.master_files WHERE type_desc = 'LOG' ) ldf

ON mdf.database_id = ldf.database_id

Here's a copy of the output

how does a SQL query work?