I am currently developing a an application to allow users to search through a database of documents using various paramaters and returning a set of paged results. I am building it in PHP/MySQL, which is not my usual development platform, but its been grand so far.
The problem I am having is that in order to return a full set of results I have to use LEFT JOIN on every table, which completely destroys my performance. The person who developed the database has said that the query I am using will return the correct results, so thats what I have to use. The query is below, I am by no means an SQL Guru and could use some help on this.
I have been thinking that it might be better to split the query into sub-queries? Below is my current query:
SELECT d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM deposition d
LEFT JOIN person AS p ON p.deposition_id = d.deposition_id
LEFT JOIN person_type AS pt ON p.person_type_id = pt.person_type_id
LEFT JOIN place_link AS pl ON pl.deposition_id = d.deposition_id
LEFT JOIN date AS dt ON dt.deposition_id = d.deposition_id
LEFT JOIN place AS plc ON pl.place_id = plc.place_id
LEFT JOIN county AS c ON plc.county_id = c.county_id
WHERE 1 AND 开发者_如何学Pythond.manuscript_number = '840'
GROUP BY d.deposition_id ORDER BY d.folio_start ASC
LIMIT 0, 20
Any help or guidance would be greatly appreciated!
Deposition Table:
CREATE TABLE IF NOT EXISTS `deposition` (
`deposition_id` varchar(11) NOT NULL default '',
`manuscript_number` int(10) NOT NULL default '0',
`folio_start` varchar(4) NOT NULL default '0',
`folio_end` varchar(4) default '0',
`page` int(4) default NULL,
`deposition_type_id` int(10) NOT NULL default '0',
`comments` varchar(255) default '',
`title` varchar(255) default NULL,
PRIMARY KEY (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Date Table
CREATE TABLE IF NOT EXISTS `date` (
`deposition_id` varchar(11) NOT NULL default '',
`day` int(2) default NULL,
`month` int(2) default NULL,
`year` int(4) default NULL,
PRIMARY KEY (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Person_Type
CREATE TABLE IF NOT EXISTS `person_type` (
`person_type_id` int(10) NOT NULL auto_increment,
`person_type_desc` varchar(255) NOT NULL default '',
PRIMARY KEY (`person_type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=59 ;
Seems that you want to select one person, place etc. per deposition.
The query you wrote will return you this, but it's not guaranteed which one will it return, and the query is inefficient.
Try this:
SELECT d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM deposition d
LEFT JOIN
person p
ON p.id =
(
SELECT id
FROM person pi
WHERE pi.deposition_id = d.deposition_id
ORDER BY
pi.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
place_link AS pl
ON pl.id =
(
SELECT id
FROM place_link AS pli
WHERE pli.deposition_id = d.deposition_id
ORDER BY
pli.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
date AS dt
ON dt.id =
(
SELECT id
FROM date AS dti
WHERE dti.deposition_id = d.deposition_id
ORDER BY
dti.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
place AS plc
ON plc.place_id = pl.place_id
LEFT JOIN
county AS c
ON c.county_id = plc.county_id
WHERE d.manuscript_number = '840'
ORDER BY
d.manuscript_number, d.folio_start
LIMIT 20
Create an index on deposition (manuscript_number, folio_start)
for this to work fast
Also create a composite index on (deposition_id, id)
on person
, place_link
and date
.
The poor performance is almost certainly from lack of indexes. Your deposition table doesn't have any indexes, and that probably means the other tables you're referencing don't have any either. You can start by adding an index to your deposition table. From the MySQL shell, or phpMyAdmin, issue the following query.
ALTER TABLE deposition ADD INDEX(deposition_id, manuscript_number);
You know you're on the right track if the query executes faster after adding the index. From there you might want to put indexes on the other tables on the referenced columns. For instance for this part of your query "LEFT JOIN person AS p ON p.deposition_id = d.deposition_id", you could try adding an index to the person table using.
ALTER TABLE person ADD INDEX(deposition_id);
You only need a LEFT JOIN
if the joined table might not have a matching value. Is it possible in your database schema for a person
to not have a matching person_type
? Or deposition
to not have a matching row in date
? A place
not have a matching county
?
For any of those relationships that must exist for the result to make sense you can change the LEFT JOIN
to an INNER JOIN
.
These columns should have indexes (unique if possible):
person.deposition_id
date.deposition_id
place_link.deposition_id
place_link.place_id
The date
table looks like a bad design; I can't think of a reason to have a table of dates instead of just putting a column of type date
(or datetime
) in the deposition
table. And date
is a terrible name for a table because it's a SQL reserved word.
精彩评论