Tree structures and threads_问答_开发者_运维开发者技术经验分享

I have a speed critical multithreaded program which involves data in a tree structure. Implemented as follows:

typedef struct
{
    // data pertaining to linkages, defining the architecture of the tree
    int parent_node;
    int child_node[MAX_CHILD_NODES];
    int number_of_children;

    // data pertaining to info at each node
    float interesting_info_A;
    char interesting_info_B[STRING_LEN];
    long interesting_info_C;
}
node_type;

node_type node[MAX_NUMBER_OF_NODES];

CRITICAL_SECTION node_critsec[MAX_NUMBER_OF_NODES];

The program enters and leaves critical sections controlled by node_critsec[]. So when I need to process interesting_info_A/B/C of node n then I enter the critical section of that node (node_critsec[n]), do my processing and then leave. The program meanders around the tree taking all sorts of complex paths following links to parents and children. The program will also grow the tree, i.e. add new n开发者_StackOverflowodes and modify the parent/children links of other nodes accordingly (the tree never shrinks). I try and ensure that each thread only ever locks one node at a time to avoid the risk of deadlocks. But then I have a problem- if I'm adding a new node then I may want to lock the node's parent so that I can adjust its list of children. Getting this to all to work without either deadlocks or two threads attempting to modify data in the same node is getting to be a bit of a nightmare. Is there some guiding principle I should be following about when and which nodes to lock/unlock that I should be following - or do I just have to be super-smart and work out every permutation of events that can occur?

A simple rule: to avoid deadlock when locking multiple items always lock them in the same order everywhere. So if you have items A, B, C, and D you should always lock them in, say, alphabetical order and no other. If you've locked C and decide that you need B, then you have to release C and then lock B and re-lock C.

In a tree, I suppose, you could always lock from the top, down. If you need to lock a parent then release and re-acquire the locks as needed.

There are other schemes that work just as well, but this one's straightforward.

Edit: You can read a little about it here.

If growing the tree is relatively uncommon, one possibility would be to use a read/write lock that allows multiple readers but only one writer. Use a single R/W lock to lock the tree itself during traversal (acquire the read lock). Any number of threads could do this. When a thread needed to add a new node, it would acquire the write lock. This would block all readers for the duration of the update. Note that you would probably need to set up the read/write lock to give precedence to writers to avoid starving them.

Using that mechanism would eliminate the need for a single thread to acquire multiple critical sections for individual nodes and thus simplify the process.

This is reminiscent of node locking in file systems. If you are looking for reference material, you could check out the VFS layers in linux, BSD, open solaris, .... However, be warned it can be complicated stuff, and consequently may not be the best example for reference.

I would like to add some points in addition to those that clintp made (heed his points well).

It could be worthwhile to use a mutex to lock access to the entire tree when you need it and then unlock it when you are done. Although this single threads the application in this critical section, it can be useful to get things up and going quickly and safely. Who knows, the performance that this offers might be good enough. If it is not, at least it gets you going forward. The bottlenecks can be loosened a little if you use a read-write semaphore in lieu of a mutex; everything becomes single threaded for writes, while reads can be concurrent.
Make a list of all the operations you would like to make to your tree and categorize them (reading, writing, updating, moving, renaming, deleting, ...) and figure out what degree of concurrency you want. From what you have written, you need more than read-only. Do you want thread A to be able to read nodes that are not locked for writing by thread B? I speak from experience that skipping this step can cost you a lot of time.

Hope this helps.