开发者

Large array and multiple threads - spinlocks or local buffers or something else?

开发者 https://www.devze.com 2023-01-09 22:03 出处:网络
I have an array with 250k entities (with a size of 20 bytes) and 4 threads. Each thread modifies ~100k random entities (that means, you can\'t predict which entities it will touch). It can modify the
  • I have an array with 250k entities (with a size of 20 bytes) and 4 threads.
  • Each thread modifies ~100k random entities (that means, you can't predict which entities it will touch). It can modify the same entity multiple times.
  • Each entity gets modified up to ~10 times.
  • The modification takes ~10-6 seconds.
  • Each entity gets modified by multiple threads

The both last points are the most important facts. The 5th point implies that I need a mechanism to protect my entities from getting corrupted due to concurrent access/modification. The 4th point makes me worry whether classical locking mechanisms like mutexes, which block threads, create to much overhead considering the short timespan a lock would be acquired for.

I've come up with two ideas:

  • Using spinlocks to overcome the overhead (presupposing my assumption about the overhead is correct in the first place).
  • Giving each thread a local copy of the array it can modify without interruptions. After all threads have finished, merging all arrays into one. This is possible because I'm able to pick a winner, if there are multiple copies of an entity.

What do You recommend? Do You agree on one of my ideas or do You recommend something else? Does Your recommendation change if I change the numbe开发者_如何转开发rs to?:

  • 1M entities
  • 8 threads
  • ~500k random accesses
  • ~100 modifications per entity

Please also point me towards implementations in C#/.Net. Thanks in advance.

Additional Informations

The entities are value types (structs). I can not afford to create new objects for each write operation - only modify existing primitives.


As they say, there's more than one way to skin a cat (though why anybody wants skinned cat is another question) :-)

With 250K objects and 4 threads, you'd have to guess that conflicts will be (relatively) rare. That doesn't mean that we can ignore them, but it may affect how we look for them. Testing a critical section is very fast, unless there is actually a conflict. That means that it might be feasible to check a critical section for every transaction, in the knowledge that relatively few checks will take more that a few CPU ticks.

Is it feasible to create 250K critical sections? Maybe, I'm not sure. You can create a very lightweight spinlock with:

while (0 != ::InterlockedExchange(&nFlag, 1)) {};
DoStuff();
nFlag = 0;

An alternate approach might to partition the dataset and have each thread work on a unique set of objects. That makes conflicts impossible so no locking is needed. Depending on the nature of the problem, you might achieve this by having each thread operate on a range of data, or possibly by operating a queue for each worker thread and have one or more scanning threads identify objects needing processing and pushing them onto the appropriate processing queue.


It seems like the simplest solution can fit here. The simplest solution is to lock the instance the thread currently manipulates.

I base it on a simple execution with and without locks.

This run, that locks the instance, takes ~10.09 seconds. The same run, without lock takes ~9.03 seconds:

const int numOfPersons = 250000;
var persons = new List<Person>(numOfPersons);
for (int i = 0; i < numOfPersons; i++)
{
    persons.Add(new Person());
}

var rand = new Random();

var sw = Stopwatch.StartNew();

for (int j = 0; j < 100; j++)
{
    for (int i = 0; i < 100000; i++)
    {
        int index = rand.Next(0, numOfPersons);

        Person person = persons[index];
        lock (person)
        {
            person.Name += "a";
        }
    }
}

Console.WriteLine(sw.Elapsed);

Since the ratio of elements to threads is big enough, the expectation of time each thread needs to wait for an instance is negligible.

As can be seen from the example, the time overhead for locking the instances is ~1 second. This code does 100 times 100,000 modifications in a collection of size of 250,000 items. The 1 second time is roughly constant no matter what the modifications are.


It seems that your entities are structures (of 20 bytes each). This is a wild guess, because I have no idea what you're actually trying to do, but can't you make those entities immutable reference types?

When you make immutable reference types, your array will only consist of references, which will be 4 bytes (or 8 bytes on 64 bits) in size and changing a reference will always be an atomic operation (unless you explicitly change alignment of course). Changing an entity means creating a new one and replacing the reference in the array from the old to the new. This way changes are atomic. However, you can still loose changes when two threads write to the same slot shortly after each other (but you don't seem to worry about that, because you are talking about 'picking a winner').

I have no idea what this will do to the performance, because your might explicitly choose for a value type array instead of a reference type array. However, sometimes it is good to make the solution simpler, not harder. This solution might also improve cache locality, because you are talking about random access to a big array. This array will therefore not fit into the CPU’s cache and you will have a lot of cache misses.

0

精彩评论

暂无评论...
验证码 换一张
取 消