I have a question relating to a general approach to a pretty large MySQL database. I've made some php code to interact with the database. I'm trying to analyse a hefty-ish set of data (~130k rows, 200 columns), and have been toying with different methods to do so. I've been learning a great deal along the way, and I feel as though I am close to getting it setup to be really speedy, but am still a bit stuck.
I began by being firmly in the 'excel' mindset. I continually added more and more columns to the dataset, as I was trying to select various bits and pieces out for the purpose of statistical anal开发者_StackOverflowysis. Some of the php/mysql scripts I had made took hours.
Then, with at least the basics working, I learned about joins. This was a bit of a revelation I guess, but also resulted in me re-writing everything to get the joins to play nice with my data. The net result was a massive increase in performance - what took hours before takes about 15 seconds now.
After chatting with a few people, I came to the conclusion that I could still make it faster. The way I had it set up was so that different samples of data were each contained in a different table. Each table had it's data summarised in a further table that was used as part of the joins - general info about that particular dataset was stored in this secondary table for easy access and to increase speed.
Now, the question I have here is this: would it be better for me to change the way my database and application work so that all these different samples of data are combined into a single, large table? I've been experimenting with this so far for a bit, and it doesn't seem to be faster than the current method I am using.
In other words, is it better to run lots of 'little' queries involving multi-table joins, as I am doing at the moment, rather than a single, gigantic query involving multi-table joins? I've been examining the execution time of the queries and it seems like the joins are causing the slow-down for this new method.
I was under the impression that repeatedly sending small queries from PHP to MySQL was less optimal than just sending a single query, but is there a tipping point for more complex queries where this is not the case? Does it seem like I have reached that point?
Doing query optimization in PHP is not the way to get the best performance out of the DB. A properly formatted SQL query and MySQL's built-in query optimizer will probably do the job better. (For example, the query optimizer in MySQL can do things like http://en.wikipedia.org/wiki/Block_nested_loop)
But the real answer depends on what you're trying to do. If speed is the number one priority, please list what you're trying to query and your data schema. Answers will typically involve adding/removing indices and tweaking your queries.
If keeping your database footprint small is your goal (which I highly doubt given how cheap disk space is), normalize everything.
If your JOIN queries are done/indexed correctly, I would think they'd be better to use.
精彩评论