开发者

Entity resolution for venues and other geo locations

开发者 https://www.devze.com 2022-12-20 01:39 出处:网络
Say I want to build a check-in aggregator that counts visits across platforms, so that I can know for a given place how many people have checked in there on Foursquare, Gowalla, BrightKite, etc.Is the

Say I want to build a check-in aggregator that counts visits across platforms, so that I can know for a given place how many people have checked in there on Foursquare, Gowalla, BrightKite, etc. Is there a good library or set of tools I can use out of the box to associate the venue entries in each service with a unique place identifier of my own?

I basically want a function that can map from a pair of (placename, address, lat/long) tuples to [0,1) confidence that they refer to the same real-world location.

Someone must have done this already, but my google-fu is开发者_开发问答 weak.


Yes, you can submit the two addresses using geocoder.net (assuming you're a .Net developer, you didn't say). It provides a common interface for address verification and geocoding, so you can be reasonably sure that one address equals another.

If you can't get them to standardize and match, you can compare their distances and assume they are the same place if they are below a certain threshold away from each other.


I'm pessimist that there is such a tool already accessible.

A good solution to match pairs based on the entity resolution literature would be to

  • get the placenames, define and use a good distance function on them (eg. edit distance),
  • get the address, standardize (eg. with the mentioned geocoder.net tools), and also define distance between them,
  • get the coordinates and get a distance (this is easy: there are lots of libraries and tools for geographic distance calculations, and that seems to be a good metric),
  • turn the distances to probabilities ("what is the probability of such a distance, if we suppose these are the same places")(not straightforward),
  • and combine the probabilities (not straightforward also).

Then maybe a closure-like algorithm (close the set according to merging pairs above a given probability treshold) also can help to find all the matchings (for example when different names accumulate for a given venue).

It wouldn't be a bad tool or service however.

0

精彩评论

暂无评论...
验证码 换一张
取 消