I'm working on a shortest path a* algorithm in java with a mysql db. I'm executing the followin开发者_如何学Pythong SQL Query approx 300 times in the program to find route connections from a database of 10,000 bus connections. It takes approx 6-7 seconds to execute the query 300 times. Any suggestions on how I can speed this up or any ideas on a different method i can use ? Thanks
private HashMap<Coordinate,Node> closedNodes;
private PriorityQueue<Node> openNodes;
..
private List<Coordinate> calculatePath()
{
//While there are nodes in the open list
while (!openNodes.isEmpty())
{
//Get the node with the lowest gVal+hVal
Node node = openNodes.poll();
//Add it to the closed list
closedNodes.put(node);
//If it is not the goal node
if (!node.equals(goal))
{
//Get all the neighbours and Create neighbour node
List<Node> neighbours = helper.getNeighbours(node, goal);
//For each neighbour
for (Node neighbourNode : neighbours)
{
//Check if the neighbour is in the list of open nodes
boolean isInOpen = checkOpenNodes(neighbourNode);
//If it is not in the open nodes and not in the closed nodes
if ((!closedNodes.containsKey(neighbourNode))&& (!isInOpen))
{
//Add it to the list of open nodes
openNodes.add(neighbourNode);
}
}
}
else
{
// We found the path
path = backTrackPath(node);
break;
}
}
return path;
/**
* Gets the list of valid Nodes that are possible to travel to from <b>Node</b>
* @param stopNode Node to find neighbours for
* @param goal End Node
* @return list of neighbour Nodes
*/
public ArrayList<Node> getNeighbours(Node stopNode, Node goal)
{
ArrayList<Node> neighbours = new ArrayList<Node>();
Node neighbourNode;
//get neighbours connected to stop
try {
ResultSet rs = stmt.executeQuery("select To_Station_id, To_Station_routeID, To_Station_stopID," +
"To_Station_lat, To_Station_lng, Time from connections where Connections.From_Station_stopID ="
+stopNode.getCoord().getStopID()+" ORDER BY Connections.Time");
rs = stmt.getResultSet();
while (rs.next()) {
int id = rs.getInt("To_Station_id");
String routeID = rs.getString("To_Station_routeID");
String stopID = rs.getString("To_Station_stopID");
String stopName = rs.getString("To_Station_stopName");
Double lat = rs.getDouble("To_Station_lat");
Double lng = rs.getDouble("To_Station_lng");
int time = rs.getInt("Time");
neighbourNode = new Node(id, routeID, stopID, stopName, lat, lng);
neighbourNode.prev = stopNode;
neighbourNode.gVal = stopNode.gVal + time;
neighbourNode.hVal = heuristic.calculateHeuristic(neighbourNode, goal);
neighbours.add(neighbourNode);
}
}
catch (SQLException e) {
e.printStackTrace();
}
return neighbours;
}
- Make sure you have an index on
connections.From_Station_stopID
- Instead of
SELECT *
, only select the columns you need - If only the constant in the
WHERE
clause forFrom_Station_stopID
changes each time, use a parameterized, prepared query so that the database doesn't have to parse the query and build the execution path each time, or combine the queries into one usingWHERE From_Station_stopID IN (value1, value2, ...)
- If you're repeating the same queries often, ensure that MySQL is using query caching
If you showed us the rest of the code, where it's looping to call the query 300 times we might be able to help further.
In general, I'd say that if you're computing the shortest path each time, instead, you should build a table that works like a grid, with route distances precalculated for each stop, or even entire routes precalculated from each stop.
As I understand you have a graph with Stations as nodes and Connections as edges.
Try to create some object which will represent that graph (it can be a matrix in the simplest case) and perform your search on that object. Then you will not need to do 300 calls to your database which are very costly from the performance point of view.
For a start, you should be using a PreparedStatement, rather than a normal query, and just do stmt.setInt(1, StopId)
each time through.
Also, better to select the specific fields you are interested in, rather than select *
.
Those are just general JDBC tips, which will probably not have a huge impact on the runtime, but are worth doing.
After that, I would try to investigate the table indexes, to make sure that the query based on From_Station_stopID is indeed executing as fast as it can.
If it is, and the only overhead is the amount of separate calls to the database, the next step might be to try to combine the queries, perhaps by making it a select ... from connections where From_Station_stopID in (..., ..., ...)
.
Depending on the size of the table, you may simply want to load the whole thing in advance into memory (perhaps as a HashMap), and then you wouldn't need to go to the database on every iteration.
In short, it depends a lot on the different parameters of the problem, and you will need to check to see which solution works best for you.
In general, if your query is slow and expensive, try caching the results somewhere, so on the next lookup it will be retrieved quickly from cache. So you would (expensively) compute connection between point A and B, store the entire result set in other (temporary=cache) table in the database with a defined lifetime, so for the next X hours/days (or until the routes change) you can retrieve route from A to B from this cache table.
You can use the IN clause in order to run the query only once - select * from connections where Connections.From_Station_stopID IN (value1, value2, ...).
精彩评论