I have a Rails 3 project running on top of PostgreSQL 9.0.
Use Case: Users can request to follow Artists
by name. To do this, they submit a list of names to a REST resource. If I can't find the Artist
by name in the local collection, I consult last.fm for information about them, and cache that information locally. This process can take some time, so it is delegated to a background job called IndexArtistJob
.
Problem: IndexArtistJob
will be run in parallel. Thus, it is possible that two users may request to add the same Artist
at the same time. Both users should have the Artist
added to their collection, but only one Artist
should end up in the local database.
Relevant portions of the Artist
model are:
require 'services/lastfm'
class Artist < ActiveRecord::Base
validates_presence_of :name
validates_uniqueness_of :name, :case_sensitive => false
def self.lookup(name)
artist = Artist.find_by_name(name)
return artist if not artist.nil?
info = LastFM.get_artist_info(name)
return if info.nil?
# Check local DB again for corrected name.
if name.downcase != info.name.downcase
artist = Artist.find_by_name(info.name)
return artist if not artist.nil?
end
Artist.new(
:name => info.name,
:image_url => info.image_url,
:bio => info.bio
)
end
end
The IndexArtistJob
class is defined as:
class IndexArtistJob < Struct.new(:user_id, :artist_name)
def perform
user = User.find(user_id)
# May return a new, uncommitted Artist model, or an existing, committed one.
artist = Artist.lookup(artist_name)
return if artist.nil?
# Presume the thread is pre-empted here for a long enough time such that
# the work done by this worker violates the DB's unique constraint.
user.artists << artist
rescue ActiveRecord::RecordNotUnique # Lost race, defer to winning model
user.artists << Artist.lookup(artist_name)
end
end
What I'm trying to do here is let each worker commit the new Artist
it finds, hoping for the best. If a conflict does occur, I want the slower worker(s) to abandon the work they did in favor of the Artist
that was just inserted, and add that Artist
to the specified user.
I'm aware of the fact that Rails validators are no substitute for act开发者_开发知识库ual data integrity checking at the level of the database. To handle this, I added a unique index on the Artist table's lowercased name field to handle this (and to use for searching). Now, if I understand the documentation correctly, an AR's association collection commits changes to the item being added (Artist
in this case) and the underlying collection in a transaction. But I can't be guaranteed the Artist
will be added.
Am I doing this correctly? If so, is there a nicer way to do it? I feel like structuring it around exceptions accentuates the fact that the problem is one of concurrency, and thus a bit subtle.
Sounds like you could use a simple queuing mechanism. You could do this using a database table:
When a "front-end" thread discovers a missing Artist, have it write the Artist name to the table with status "waiting" (have a unique index on Artist name so this can only happen once).
Meanwhile a background thread/process sits in a loop and queries the table for new jobs:
a) start transaction
b) find first Artist with status="waiting"
c) update Artist status to "processing"
d) end transactionThe background thread then indexes the Artist. Noone else will try because they can see the status as "processing".
When finished, the background thread deletes the Artist from the table.
Using this method, you could run multiple background threads to increase concurrency on your Artist indexing.
Also look at something like beanstalk to manage this process. See http://railscasts.com/episodes/243-beanstalkd-and-stalker.
精彩评论