I've been studying a little about garbage collection, mostly applied to server-side / real-time applications, and I've started to sketch an algorithm with which it would be possible to have an asynchronous garbage collection system. Since I'm starting on this topic now, so I don't know very deeply about gc algorithms, I was wondering about the possible pitfalls of an implementation like that. The algorithm is very crude, and with many undefined parts.
Here's how I thought about it:
- each thread has its own heap space to manage, and stores a list of pointers it owns that are in use by other threads With that, garbage collection works totally asynchronous to the running threads, and:
- phase 1 start following the threads' roots and marking all objects reacheable by them. If we get into another thread's space, we stop following this pointer, and mark this pointer as "in use" on the owner thread
- after we have marked all regions, we select a region (maybe with most dead references as possible), and start copying its live object references to another space. (it might be the whole thread heap space, but I think it could be too memory-intensive this operation)
- copying starts with setting with CAS a flag which states that the object is being copied. Any mutable action to be performed on this particular object while that flag is set will spin lock until a new address is set by the gc thread. When copying finishes, a new address is set on the old one, and any mutable reference to be performed on the object will be routed to the new object
- after updating all references made to those pointers using CAS, the old space is finally freed (no new pointers will be updated with the wrong address, since every mutator will first check to see if the reference has changed locations)
That's it!
Anyway, I'm quite excited with a possible implementation that doesn't stops-the-world, and using only fast spin-locks that apply only to the object being copied. But I'd like to know if this is possible to implement, or if there is any possibility of having a dangling pointer somewhere, or memory leaks, or it it's inneficient, etc. Any info that will contribute to this will be greatly appreciated!
I'm not at all too sure about how e.g. it would handle circular references from different threads. I think this would be 开发者_JAVA百科handled naturally since we update all hazard pointers the current gc'ed thread has. There might also be some kind of concurrent access I wasn't considering.
Thank you!
---- EDIT: Thanks to Ernest's contribution, I'm thinking about not using a copying algorithm, but maybe a simple mark & sweep. That's because we'd need to check every time we access an object's variable if the pointer has been updated. This seems to me a quite big overhead. Doesn't it?
Your ideas are good but I believe there are lots of subtle problems that will require a lot of work to solve and, once solved, you won't be able to attain competitive throughput performance.
each thread has its own heap space to manage, and stores a list of pointers it owns that are in use by other threads With that, garbage collection works totally asynchronous to the running threads, and:
If there is a shared object, which thread owns it? If a concurrent collection has objects added to it from different threads, will there be many inter-heap pointers? How do you handle cycles between heaps?
phase 1 start following the threads' roots and marking all objects reacheable by them. If we get into another thread's space, we stop following this pointer, and mark this pointer as "in use" on the owner thread
If this is done concurrently with the mutator running, how do you prevent race conditions where the mutators alter the topology to introduce or eliminate inter-heap references while the GC is doing this marking?
after we have marked all regions, we select a region (maybe with most dead references as possible), and start copying its live object references to another space. (it might be the whole thread heap space, but I think it could be too memory-intensive this operation)
If this is for multicore, copying will saturate global memory bandwidth and destroy scalability.
copying starts with setting with CAS a flag which states that the object is being copied. Any mutable action to be performed on this particular object while that flag is set will spin lock until a new address is set by the gc thread. When copying finishes, a new address is set on the old one, and any mutable reference to be performed on the object will be routed to the new object
Lock-free is tricky. You're making assumptions about memory models. You may need memory barriers. Looks like you're just emulating a lock, in which case you're talking about acquiring and releasing a lock around every write of a reference into the heap which would be prohibitively expensive.
Spin locks are also bad for concurrency because you're not only tying up a thread but also burning a core that can no longer do the work that you are waiting on! See the last core slowdown bug in GHC. A wait free solution would use CAS to get the mutator thread to help with the GC work rather than block waiting for another thread to do it.
after updating all references made to those pointers using CAS, the old space is finally freed (no new pointers will be updated with the wrong address, since every mutator will first check to see if the reference has changed locations)
Ok.
The major issue I see is synchronization. You need memory barriers to ensure that the various threads see up-to-date data on each other's heaps, and I don't see really where those would go and still maintain your fully asynchronous operational model.
Just saw this recently http://java-monitor.com/forum/showthread.php?t=890
As far as I understand you are talking about model similar to what is used by Erlang VM - each threads has its own heap. It is possible by Erlang nature - no spin locks are required (at least for the thread heap).
精彩评论