开发者

Large Product catalog with statistics - alternatives to Sql Server?

开发者 https://www.devze.com 2022-12-28 01:18 出处:网络
I am building UI for a large product catalog (millions of products). I am using Sql Server, FreeText search and ASP.NET MVC.

I am building UI for a large product catalog (millions of products).

I am using Sql Server, FreeText search and ASP.NET MVC.

Tables are normalized and indexed. Most queries take less then a second t开发者_开发技巧o return.

The issue is this. Let's say user does the search by keyword. On search results page I need to display/query for:

  1. Display 20 matching products on first page(paged, sorted)
  2. Total count of matching products for paging
  3. List of stores only of all matching products
  4. List of brands only of all matching products
  5. List of colors only of all matching products

Each query takes about .5 to 1 seconds. Altogether it is like 5 seconds.

I would like to get the whole page to load under 1 second.

There are several approaches:

  1. Optimize queries even more. I already spent a lot of time on this one, so not sure it can be pushed further.

  2. Load products first, then load the rest of the information using AJAX. More like a workaround. Will need to revise UI.

  3. Re-organize data to be more Report friendly. Already aggregated a lot of fields.

I checked out several similar sites. For ex. zappos.com. Not only they display the same information as I would like in under 1 second, but they also include statistics (number of results in each category).

The following is the search for keyword "white" http://www.zappos.com/white

How do sites like zappos, amazon make their results, filters and stats appear almost instantly?


So you asked specifically "how does Zappos.com do this". Here is the answer from our Search team.

An alternative idea for your issue would be using a search index such as solr. Basically, the way these work is you load your data set into the system and it does a huge amount of indexing. My projects include product catalogs with 200+ data points for each of the 140k products. The average return time is less than 20ms.

The search indexing system I would recommend is Solr which is based on lucene. Both of these projects are open source and free to use.

Solr fits perfectly for your described use case in that it can actually do all of those things all in one query. You can use facets (essentially group by in sql) to return the list of different data values for all applicable results. In the case of keywords it also would allow you to search across multiple fields in one query without performance degradation.


you could try replacing you aggergate queries with materialized indexed views of those aggregates. this will pre-compute all the aggregates and will be as fast as selecting any regular row data.


.5 sec is too long for an appropriate hardware. I agree with Aaronaught and first thing to do is to convert it in single SQL or possibly Stored Procedure to ensure it's compiled only once.

Analyze your queries to see if you can create even better indexes (consider covering indexes), fine tune existing indexes, employ partitioning.

Make sure you have appropriate hardware config - data, log, temp and even index files should be located on independent spindles. make sure you have enough RAM and CPU's. I hope you are running 64-bit platform.

After all this, if you still need more - analyze most used keywords and create aggregate result tables for top 10 keywords.

Amount Amazon - they most likely use superior hardware and also take advantage of CDN's. Also, they have thousands of servers surviving up the content and there is no performance bottlenecks - data is duplicated multiple times across several data centers.

As completely separate approach - you may want to look into "in-memory" databases such as CACHE - this is the fastest you can get on DB side.

0

精彩评论

暂无评论...
验证码 换一张
取 消