I have a database containing a lot of train times, for a lot of different trains. It looks something like this:
Table1
Station_id Station_name etc
Table2
StationName Arrival Departure TrainNumber etc
I know how to retrieve the time of departure for a given train, but how would i proceed if I wanted to retrieve a travel plan containing several connecting t开发者_StackOverflowrains based on a user selected departure station and arrival station?
E.g someone want to go from London to Manchester, but there are no direct trains travelling this route, so he would have to change train i Liverpool. How would I query the database to get this result?
I hope someone can help me with this as I am unable to find a solution.
It's true that finding the optimal solution to the problem given completely unknown data is way beyond the capabilities of SQL. That is, in theory the quickest route may be to take a trip with thousands of train changes, and the total number of possible combinations is far beyond the capabilities of any computer to process.
But in practice, I think it's safe to say that a trip with fewer train changes is almost always faster than a trip with more stops, and the traveller would probably prefer fewer train changes over a slightly shorter total trip anyway as it avoids the hassle of changing trains and the danger of missing a train because of delays. I don't know what your rail system is like (or even where you're from) but I'd guess that most trips can be completed with a modest number of train changes, probably rarely more than 2, right?
So David O'Neill's solution is basically right, though I think the query he gives is a little confused. He fails to specify the starting station, he seems to assume that the train number is unchanged throughout the trip, and perhaps most important, he calculates the time as the sum of the travel times. Surely what matters to the traveller is the total time between departure and arrival, not just the time on the train. Ten minutes on the train, four hours at the station waiting for the next train, and another ten minutes on the second train, is 4 hours and twenty minutes, not twenty minutes. Oh, and he doesn't seem to insure that the connecting train doesn't leave until after the first train arrives. A connection does you little good if the other train leaves ten minutes before you get there.
So I think the real query would be more like this:
Oh, your description of what's in the tables seems incompletet, so excuse me if I just invent my own tables with the data you would have to have to do this. Namely, you had better have:
Trip (train_number, departure_station_id, arrival_station_id, departure_time, arrival_time)
Station (station_id, station_name)
First query: Look for a direct trip:
select t1.train_number, t1.arrival_time-t1.departure_time as time
from trip t1
where t1.departure_station_id=?station_from
and t1.arrival_station_id=?station_to
order by time
If that turns up nothing, try again with one train change:
select t1.train_number, t2.train_number, t2.arrival_time-t1.departure_time as time
from trip t1
join trip t2 on (t2.departure_station_id=t1.arrival_station_id
and t2.departure_time>t1.arrival_time)
where t1.departure_station_id=?station_from
and t2.arrival_station_id=?station_to
order by time
If still nothing, try two train changes:
select t1.train_number, t2.train_number, t3.train_number, t3.arrival_time-t1.departure_time as time
from trip t1
join trip t2 on (t2.departure_station_id=t1.arrival_station_id
and t2.departure_time>t1.arrival_time)
join trip t3 on (t3.departure_station_id=t2.arrival_station_id
and t3.departure_time>t2.arrival_time)
where t1.departure_station_id=?station_from
and t3.arrival_station_id=?station_to
order by time
Etc.
You could combine all that into a single query with a UNION, filling in nulls for the extra train numbers in the shorter queries. That would have the advantage that if a trip with extra train changes really was shorter, you'd find that one. But I suspect that rarely happens. And to make the union work, you'd have to go up to the maximum number of changes -- whatever you're going to allow -- with every query, and performance would probably suck.
I haven't tried this, but I'd guess that with proper indexes the query for no-changes would be lightning fast, one change would be very fast, two changes would start getting slow, and beyond that would probably be a very long query, because the number of possible combinations would be huge. Note that a simple query like this would include absurd itineries. Like you want to go from London to Paris? We could try London to Moscow to Rome to Paris. Or London to Glasglow back to London then to Paris. Of course these would have long times and the ORDER BY would sort them to the bottom, but they'd show up as possible routes that would have to be eliminated. You could add criteria to eliminate routes that come back on themselves, but other absurd trips are harder to identify.
Have fun!
This cannot be reduced to a simple SQL query. Or a complex one, for that matter :)
You want to solve the Shortest Path Problem.
You will need to build a graph using your stations as vertices, and paths between vertices representing a possible journey from one station to another (Note that you will have many paths from one node to another, keyed by departure time). The weight of the each path is its (the journey's) duration.
You then solve the shortest path problem using one of the algorithms suggested by the Wikipedia link, bearing in mind that you have to add the cost of waiting at station N until the departure time of the train to station N+1.
Sounds like a fun hack!
I'm a little unsure about what information is in table2. For the moment, I'm assuming that table2 contains both a departure time and a destination station (dest_station_name)
select t1.station_name, t2.station_name, t3.station_name,
t4.station_name, t5.station_name,
sum(t2.arrival - t1.departure + /*the rest of them */) as total_time from table2 t1
join table2 t2 on t1.dest_station_name = t2.station_name
and t1.train_number = t2.train_number
join table2 t3 on t2.dest_station_name = t3.station_name
and t2.train_number = t3.train_number
join table2 t4 on t3.dest_station_name = t4.station_name
and t3.train_number = t4.train_number
join table2 t5 on t4.dest_station_name = t5.station_name
and t4.train_number = t5.train_number
where t5.station_name = :goal_station
order by total_time desc;
Since you won't know ahead of time how many jumps you'll want to take, you'll probably want to write a series of these, each with a different number of train hops. Run 1 hop and look for no data returned. It nothing was returned, run the 2 hop query. If that returned nothing, run the 3 hop query, and so forth.
Edited formatting
NOTE: As is noted in the other answers and comments: this cannot be used to guarantee the shortest path. This CAN be used to find the path with the fewest hops. For example, the technique I wrote up would recommend going from New York to Washinton DC via the route New York to Los Angeles to Washington DC even if New York to Trenton NJ to Lanchester PA to Washington DC was available.
I suspect that you have stumbled across one of those particluarly hard problems that may not have any real answer. It may not be possible with one or even several queries. Suppose a user selects station A to Station B as a journey.
- Find all trains that start at A and end at B. If you find any then you can stop.
- If not find all trains that start at A.
- For each station S1, S2, ..., Sn that a train that starts at A end at query for trains that start at Sx to B. If you find a train then stop. But, if not you have to repeat. You have to be careful at this point that you do not end up in a loop. that is you find a train from A - E, then E - F then F - A.
You may need to use a recursive function like this(Be careful, this is not completely thought through):
journey(A, B)
if A = B then return empty journey else
query for trains that go from A to B, if found return the one step journey
else
query for all trains that start at A
for each endpoint of trains that start at A => E
rest = journey(E, B)
return A-E+rest
i don't think you really need a shortest path solution for this.
usually websites know max 2 train routes between cities. there usually aren't more anyway. if a user wants to use some other route then you have a "travel via" field.
so you'd just connect the routes in one.
I suppose the way to solve this you will need to implement some sort of searching algorithm - it will not be solvable with a simple query. You should load your train times in a structure, then look for all possible connections from A to B.
When finding all possible routes within the given parameters (lets say you know the date of the planned trip) you should sort them by several criteria - like shortest journey, fastest journey, cheapest journey - if you store ticket cost information, least number of train switches, etc ... and then display the most favorable results.
Sounds like a fun project to work on for a while. You might have given me an idea for a new hobby project for the weekend ;)
精彩评论