My web app needs to access an arbitrary E-Commerce store and determine whether or not it has a product data feed (i.e. a Google Base feed开发者_Python百科; an RSS/ATOM feed of all products in the store). Also, I need to extract the location of this feed.
The best solution I can think of so far is to maintain a comprehensive list of known locations of these feeds for given E-Commerce platforms and check them one by one for the site, crossing them off the list as they come back 404.
Two questions:
- Can you think of a better approach?
- How would I go about generating this list of known product data feed locations? In my experience, they are generally not made public (unlike blog RSS feeds).
Thanks so much! :)
-Rich
Can you think of a better approach?
User Search Engine APIs to Discover Feeds. You could try using the Google, Bing and Yahoo Search APIs to discover product feeds on the domains you are interested in. This could be done as follows:
- List the public feed formats you are interested in (e.g. Google Base, Shopzilla etc)
- Examine each feed spec for unique strings you can search on.
- Craft search API queries that return relevant results (restrict on domain, file type etc).
- Test the links you get back for product feeds.
Obviously, this assumes that the feeds have been found and indexed by the search engines.
How would I go about generating this list of known product data feed locations?
I don't believe there is such a thing as a "known location" for a product data feed. However, you could try including the following patterns in your algorithm:
- URL patterns from any feeds you already know about.
- URL patterns you have guessed (put yourself in the webmaster's shoes and think what he/she would name them).
- Review the documentation for commonly used eCommerce software and product data feed plugins to determine their default feed locations. Include their URL patterns.
精彩评论