SOA/Web Service Pagination_问答_开发者_运维开发者技术经验分享

In SOA we should not be building or holding state (or designing dependencies) between client and server. This is understood. But what patterns can be followed in the case that a client wants to consume a real-time service that may return an open ended number of 'rows'?

Web applications, similar to SOA but allowing for state (sessions) have solved this with pagination. Pagination requires (in most cases, especially with SQL) that the server holds the data and that the client request the data in chunks.

If we where to consider pagination-like scenarios for web services, what patterns would these follow that would still allow the tenets of SOA to be adhered (or as close as possible).

Some rules for the thinkers: 1) Backed by a SQL database (therefore there is no concept of a row number in a select set) 2) It is important to not skip a row or duplicate a row in a set during pagination 3) Data may be inserted and deleted at any time into the database by other clients 4) There is no need to consider the dataset a live (update-able) dataset

Personally, I think that 1 and 2 above already spell our the solution by constraining the solution space with the requirements.

My proposed solution would have the data (as much as is selected) be stored in a read-only store/cache where it can be assigned a row number within the result set and allow pagination to occur on this data snapshot. I have would have infrastructure to store snapshots (servers, external caches, memcached or ehcache - this must scale quite large). The result of such a query would be a snapshot ID and clients could retrieve the data from the snapshot using a snapshot API (web services) and the snapshot ID. Results would be processed in a read-only, forward only manner for x records at a time where x was something reasonable.

Competing thoughts and ideas, critic开发者_高级运维isms or accolades would be greatly appreciated.

Paginated results in a Web Service is actually quite easy to achieve.

All you have to do is add two parameters to the web service call: Page Size, Page Number.

Page Size is the number of results to include in a page. Page Number is the number of the page of results you are looking for.

Your web service then goes back to the database (or cache), retreives the results, figures out which results fit on the requested page, and return only those results.

The client then has to make a single request per page of results they want from the service.

What you propose with memcached will also work with a caching table. The first service call would (1) INSERT results INTO the caching table with a snapshot ID (2) return the first page from the caching table and the snapshot ID. Subsequent calls would return pages based on page size and page number by querying the caching table using the snapshot ID.

I should think this could also be optimized by using an in-memory caching table, but that depends on whether your database supports INSERT-INTO from a disk table to an in-memory table. That might get complicated in a clustered environment though.

Such a cache is stateful by its very nature if you are retaining a client-specific copy between requests, whether storage is in a session object, database table or memcached data store. Given the requirements though, you have no choice but to cache results in some form or another, except you risk the chance of returning deleted or no-longer-relevant records as legitimate results.

SOA is not meant for such low level functionality.

SOA is meant to glue together business areas, not frontends to backends. Not because your application talks to the back end using webservices you have a "SOA" application. This is non sense since SOA is meaningless in the context of 1 isolated system.

From that point of view, it is then clear that, in SOA, the caller should not have known about the SQL table you are paginating, that’s an implementation detail that SOA should hide. In the other hand the server should not know about the client's state, because it should be agnostic to the details of the clients, to be really open.

So, just understand that pagination is not SOA. Do as you wish, just understand that the webservice you are using to paginate is an internal artifact of your application, not to be used for external clients in a SOA bus. Also remember that it can not be transaction consistent with out state in the server. Probably the problem is that you have only one service layer for the application's UI and the SOA bus, you need to separate them.

Using this webservice in a SOA bus would be bad. I can not be consistent as the user paginates and as other applications hang to it they become tied to the specific SQL.

... then you might as well have granted direct SQL access to the table for all that matters.

SOA is for business messages between systems, not to glue an application's frontend to the backend.

Same problem, resolved using the Navision approach.

$ws->getList($first_record_id, $limit)

This return a page of $limit element that start from the the passed id

select * from collection where collection.id > $first_record_id ASC limit $limit

ordered by id ASC

Navision use Key (each element has a key) but in MySQL an autoincrement id is better.

In this case pagination is intended for handle large result sets and not for a frontend pagination...

I am not sure if SOA is of concern here. The problem you have seems to be with paginating your API's. I will point you to how twitter handles their pagination dev.twitter.com/rest/public/timelines