Say you want to display the latest visitors on a users 开发者_如何学Pythonprofile page.
How would you structure this? Perhaps a table called uservisitors
:
- userid (the user that gets a visit)
- visitorid (the visitor)
- time
And how would you select this with mysql? Without any duplicates, What I mean is if User 1 visits user 2's profile, then 5min later visits again, I dont want it to show both entries only the latest
cheers!
SELECT visitorid, MAX(time) AS lastvisit
FROM uservisitors
WHERE userid = ?
GROUP BY
userid, visitorid
ORDER BY
lastvisit DESC
LIMIT 5
Create a composite index on (userid, visitorid, time)
for this to work faster.
This query:
SELECT visitorid
FROM uservisitors ui
WHERE userid = ?
NOT EXISTS
(
SELECT NULL
FROM uservisitors uo
WHERE uo.userid = ui.userid
AND uo.visitorid = ui.visitorid
AND uo.time > ui.time
)
ORDER BY
time DESC
LIMIT 5
may be more efficient if you have lots of distinct visitors.
In this case, you'll need an index on (userid, time, visitorid)
.
Update:
This article in my blog compares both approaches:
- Latest visitors
Something like this should work:
SELECT visitorid, MAX(time)
FROM uservisitors
WHERE userid = 1
GROUP BY visitorid
Find distinct visitors in the last 5 minutes. You can change the INTERVAL to whatever time period such as INTERVAL 1 DAY for the last 24 hours.
SELECT distinct visitorid
FROM uservisitors
WHERE userid = 1 AND time > DATE_SUB(NOW(), INTERVAL 5 MIN);
Assuming you only care about the last visit by a visitor and not ALL the visits, how about using a unique key pair for userid/visitorid and then using INSERT INTO.. ON DUPLICATE KEY:
CREATE TABLE uservisitors (... UNIQUE (userid, visitorid) );
INSERT INTO uservisitors (userid, visitorid, time) VALUES (....) ON DUPLICATE KEY UPDATE time=NOW();
Then its a simple select to get the top 5 visitors:
SELECT visitorid FROM uservisitors WHERE user_id=1 ORDER BY time DESC LIMIT 5;
No duplicate records or having to use group by.
How do you plan on deleting stuff from the table later on? You might need to add another index for that to work efficiently.
If you only want the last 5 visits, I'd denormalize the table (saves space, index and only uses one single primary key lookup) and in the process, I would no longer have to worry about deleting old data. Each user will only occupy one row in the table so it will not grow much over time. So:
CREATE TABLE user_visitors (user_id int primary key,
visitor_1_id int,
visitor_1_time timestamp,
..
visitor_5_id int,
visitor_5_time timestamp);
To store a visit you would either insert a new row with visitor_1 as the visitor. On duplicates, you would shift down previously stored values:
INSERT INTO user_visitors SET ...
... ON DUPLICATE KEY UPDATE visitor_5_id = visitor_4_id, visitor_5_time = visitor_4_time ...
... visitor_1_id = ?, visitor_1_time = ?
If you worry about duplicates you might be able to work it out by adding IFs in the UPDATE section, such that the row won't get updated if the visitor_id is already present in the row. In code you can check for updated row count. If zero, handle the duplicate visitor time update from there. It involves some work but it will be fast and easy to understand.
I've dealt with this a few times, here's my take on it.
My table looks like this:
CREATE TABLE visitors (
userid int,
visitorid int,
last_visit datetime,
primary key(userid, visitorid),
index(visitorid)
index(userid, last_visit)
) engine = memory;
Inserting data:
INSERT INTO visitors (userid, last_visit) VALUES ( 50, now() )
ON DUPLICATE KEY UPDATE last_visit = now();
Selecting:
select * from visitors WHERE userid=10 order by last_visit limit 10; # or whatever you need
Occasionally purge the table
DELETE from visitors WHERE last_visit < date_add(now(), INTERVAL -1 WEEK);
This is the way to go for a few reasons.
- You're using an in memory table, so you never touch disk.
- Index is on (visitorid, last_update), so it's pure index lookup. Very fast. Even without it, it should be fast.
- Pulling users via a separate query will let you cache them, but in theory they should be cached already if they've visited the site recently. Even if you aren't using cache, an in() query on user (primary key) should be very fast.
You can run a cron to backup this table once a minute
SELECT * from visitors INTO OUTFILE "/tmp/visitors.txt"
精彩评论