I'm doing a little research for a server application I'm planning to build. The main function will be - lots of users will be able to perform live editing.
So looking into all the options for a scalable string, which basically be some kind of stringbuffer but be able to handle lots(hundreds?) of threads working on it at the same time with quite a large amount of text.
Instead of reinventing the w开发者_StackOverflowheel, I hope to see libraries shared that have such features :) I couldn't really find much with Google.
You might want to take a look at the sourcecode of Etherpad - it's a Java-based collaborative text editing web app, so it has to have some sort of string implementation that allows concurrent write access to separate regions of the string, presumably without losing data. Of course, whether it fulfills your performance requirements is a different matter...
String itself is thread-safe (since it is immutable) and fairly high-performance for most use cases.
The main performance issue with String is that in O(n) in the length of the string for mutations (due to the need to take a complete copy).
If you need to deal with very long strings you probably want to use a Rope data structure. There are several implementations available in Java:
- Ropes for Java - seems a very good implementation
- My own implementation - which is less general but might be faster for the implemented operations as it is designed for very low overhead.
Both of the Rope implementations above conform to the CharSequence interface (which String also implements), so if you design your application to work with CharSequences instead of Strings then you can start with Strings and switch to Ropes later if you decide you need them.
This talk (free reg. required) describes the kinds of algorithms used by the likes of Google Docs etc for multi-user editing, and demonstrates a simple implementation. In Scala but applicable to any language.
Edit: oops, a bit late! Might be useful to someone anyway...
Well, StringBuffer is thread-safe. You could base your system on that.
It seems to me that it would be a challenge. You need to allow individual users to "lock" the parts of the string they're working on, while others can lock and work on other parts. So basically what you're talking about is a form of database.
You could probably do it with SQL, but you'd have to invent a protocol.
The main issue was keep track of changes since you no longer rely on the index, it made things difficult.
So I'm currently looking at storing the strings in something like a LinkHashMap or ListOrderedMap now. But still doing more research in the correct data structure ...
Edit: At this stage I will go with a ListOrderedMap to store my strings in and see how that goes...
精彩评论