Optimizing Azure Storage queries for RIA services_问答_开发者

I'm working with Azure Storage Tables and RIA services to create a, rather basic, Silverlight UI to access and manage some of the data.

Nothing too fancy: a data model with some annotations, and a DomainDataSource, make it all a breeze.

With one minor exception.

When I have a really, really large table, then loading it, at least when I'm doing this on the Development Fabric and even using the LoadSize parameter of the DomainDataSource, takes a LOT of time.

My guess is that the following thing happens:

The Silverlight client asks the RIA service for, say, 100 first rows of the table.
Since Azure Data Storage queries are extremely basic, the service retrieves all the rows from the data storage, and THEN performs Take(100) on that set.
Client gets the first 100 rows.

This is OK for conserving开发者_开发问答 bandwidth, but it's far from optimal when you consider that one must pay for processing power used by an Azure-based application.

Is there any way to optimise the RIA service queries? Is it even at all possible to use Take() and Skip() methods on Azure Storage Tables?

EDIT: I am using WCF RIA Services SP1 Beta, and have followed several tutorials on Silverlight with RIA services (which are all very very similar). Combining the two, I'm simply wondering if loading a large table into Silverlight can be improved by adding the LoadSize parameter. At the moment it seems to conserve bandwidth only (as the correct amount of rows is being sent to the Silverlight client), but the whole process still uses as much CPU time as it does for the whole table.

Have you tried RIA Services SP1 Beta? It includes TableDomainService that handles all the logic by managing entities in the specified Azure Storage account. Also, you can inspect traffic between your service and Azure Storage using Fiddler to find out what the problem is.

References:

WCF RIA Services SP1 Beta announcement
Kyle McClellan's blog - he is the person who developed TableDomainService. There are a lot of posts in his blog about this feature.

EDIT: As Kyle McClellan mentioned below in the comment, this problem really exists in WCF RIA Services SP1 Beta.

Do you mean that you retrieve all entities in code to perform the query or that the Table service has to scan all entities to return the results? In either case you are probably using Talbe storage the wrong way.

Azure Table Storage is not an RDBMS and you shouldn't treat it this way. It is designed to allow ultra-fast and scalable modifications, easy partitioning and indexed access. The only indexed access is by the partition and row key. It is definitely not suitable for ad-hoc queries and reports. In this respect it is not that different from most NoSQL databases.

I suspect you are trying to retrieve query results using ad-hoc filters and display them in a datagrid. This is an ad-hoc query/reporting scenario for which the Table Storage is not suitable.

That said, there are ways you can optimize reporting even with table storage. The most important thing is to understand that reporting data (let's consider datagrids as reports for this discussion) should be separate from transactional data - that's what Command-Query Responsibility Separation (CQRS) is all about.

You can use multiple tables to store the results for different reports and update them asynchronously while processing transactions. You can use partition keys that combine multiple field values in each table that make retrieving the report's rows easier. You still have to define your reports at the server level. In this case, the scenario is closer to retrieving report snapshots rather than querying.
You can extract the reporting data to SQL Azure and use SQL for ad-hoc reporting. Updating will have to be done asynchronously as before, but you gain ad-hoc querying. SQL Azure is much more expensive though.
An unusual solution, suited for fairly static reports (e.g. montly or weekly summaries) is to use BLOB storage to store the report data in JSon format (or whatever suits you). You still have to define the appropriate partition and row keys. This is similar to creating report snapshots in SQL Server reporting services.

The most suitable solution depends on your specific scenario:

If your application processes business transactions (eg. orders, phone calls etc) your main table should use keys that optimize TX processing and the report tables should use keys that optimize report retrieval.
If you are building a forum, web site or CMS service, you probably can create one "Posts" table per forum for the posts, partitioned by threadid (just an idea) and separate "index" tables for tags, users etc that link to the posts table.

That said, Table Storage does support the OData/WCF Data Service query operators (filter etc), skip ($skip) and take ($top).