I am working on a scientific application that has readily separable parts that can proceed in parallel. So, I've written those parts to each run as independent threads, though not for what appears to be the standard reason for separating things into threads (i.e., not blocking some quit command or the like).
A few questions:
Does this actually buy me anything on standard multi-core desktops - i.e., will the threads actually run on the separate cores if I have a current JVM, or do I have to do something else?
I have few objects which are read (though never written) by all the threads. Potential problems with that? Solutions to those problems?
For actual clusters, can you recommend frameworks to distribute the threads to the variou开发者_如何学Pythons nodes so that I don't have to manage that myself (well, if such exist)? CLARIFICATION: by this, I mean either something that automatically converts threads into task for individual nodes or makes the entire cluster look like a single JVM (i.e., so it could send threads to whatever processors it can access) or whatever. Basically, implement the parallelization in a useful way on a cluster, given that I've built it into the algorithm, with the minimal job husbandry on my part.
Bonus: Most of the evaluation consists of set comparisons (e.g., union, intersection, contains) with some mapping from keys to get the pertinent sets. I have some limited experience with FORTRAN, C, and C++ (semester of scientific computing for the first, and HS AP classes 10 years ago for the other two) - what sort of speed/ease of parallelization gains might I find if I tied my Java front-end to an algorithmic back-end in one of those languages, and what sort of headache might my level of experience find implementing those operations in those languages?
Yes, using independent threads will use multiple cores in a normal JVM, without you having to do any work.
If anything is only ever read, it should be fine to be read by multiple threads. If you can make the objects in question immutable (to guarantee they'll never be changed) that's even better
I'm not sure what sort of clustering you're considering, but you might want to look at Hadoop. Note that distributed computing distributes tasks rather than threads (normally, anyway).
Multi-core Usage
Java runtimes conventionally schedule threads to run concurrently on all available processors and cores. I think it's possible to restrict this, but it would take extra work; by default, there is no restriction.
Immutable Objects
For read-only objects, declare their member fields as final
, which will ensure that they are assigned when the object is created and never changed. If a field is not final
, even if it never changed after construction, there can be some "visibility" issues in a multi-threaded program. This could result in the assignments made by one thread never becoming visible to another.
Any mutable fields that are accessed by multiple threads should be declared volatile
, be protected by synchronization, or use some other concurrency mechanism to ensure that changes are consistent and visible among threads.
Distributed Computing
The most widely used framework for distributed processing of this nature in Java is called Hadoop. It uses a paradigm called map-reduce.
Native Code Integration
Integrating with other languages is unlikely to be worthwhile. Because of its adaptive bytecode-to-native compiler, Java is already extremely fast on a wide range of computing tasks. It would be wrong to assume that another language is faster without actual testing. Also, integrating with "native" code using JNI is extremely tedious, error-prone, and complicated; using simpler interfaces like JNA is very slow and would quickly erase any performance gains.
As some people have said, the answers are:
Threads on cores - Yes. Java has had support for native threads for a long time. Most OSes have provided kernel threads which automagically get scheduled to any CPUs you have (implementation performance may vary by OS).
The simple answer is it will be safe in general. The more complex answer is that you have to ensure that your Object is actually created & initialized before any threads can access it. This is solved one of two ways:
Let the class loader solve the problem for you using a Singleton (and lazy class loading):
public class MyImmutableObject { private static class MyImmutableObjectInstance { private static final MyImmutableObject instance = new MyImmutableObject(); } public MyImmutableObject getInstance() { return MyImmutableObjectInstance.instance; } }
Explicitly using acquire/release semantics to ensure a consistent memory model:
MyImmutableObject foo = null; volatile bool objectReady = false; // initializer thread: .... /// create & initialize object for use by multiple threads foo = new MyImmutableObject(); foo.initialize(); // release barrier objectReady = true; // start worker threads public void run() { // acquire barrier if (!objectReady) throw new IllegalStateException("Memory model violation"); // start using immutable object foo }
I don't recall off the top of my head how you can exploit the memory model of Java to perform the latter case. I believe, if I remember correctly, that a write to a volatile variable is equivalent to a release barrier, while a read from a volatile variable is equivalent to an acquire barrier. Also, the reason for making the boolean volatile as opposed to the object is that access of a volatile variable is more expensive due to the memory model constraints - thus, the boolean allows you to enforce the memory model & then the object access can be done much faster within the thread.
As mentioned, there's all sorts of RPC mechanisms. There's also RMI which is a native approach for running code on remote targets. There's also frameworks like Hadoop which offer a more complete solution which might be more appropriate.
For calling native code, it's pretty ugly - Sun really discourages use by making JNI an ugly complicated mess, but it is possible. I know that there was at least one commercial Java framework for loading & executing native dynamic libraries without needing to worry about JNI (not sure if there are any free or OSS projects).
Good luck.
精彩评论