General methods for optimizing program for speed_问答_开发者

What are some generic methods for optimizing a program in Java,开发者_如何转开发 in terms of speed. I am using a DOM Parser to parse an XML file and then store certain words in an ArrayList, remove any duplicates then spell check those words by creating Google search URL's for each word, get the html document, locate the corrected word and save it to another ArrayList.

Any help would be appreciated! Thanks.

Why do you need to improve performance? From your explanation, it is pretty obvious that the big bottleneck here (or performance hit) is going to be the IO resulting from the fact that you are accessing a URL.

This will surely dwarf by orders of magnitude any minor improvements you make in data structures or XML frameworks.

It is a good general rule of thumb that your big performance problems will involve IO. Humorously enough, I am at this very moment waiting for a database query to return in a batch process. It has been running for almost an hour. But I welcome any suggested improvements to my XML parsing library nevertheless!

Here are my general methods:

Does your program perform any obviously expensive task from the perspective of latency (IO)? Do you have enough logging to see that this is where the delay is (if significant)?
Is your program prone to lock-contention (i.e. can it wait around, doing nothing, waiting for some resource to be "free")? Perhaps you are locking an entire Map whilst you make an expensive calculation for a value to store, blocking other threads from accessing the map
Is there some obvious algorithm (perhaps for data-matching, or sorting) that might have poor characteristics?
Run up a profiler (e.g. jvisualvm, which ships with the JDK itself) and look at your code hotspots. Where is the JVM spending its time?

SAX is faster than DOM. If you don't want to go through the ArrayList searching for duplicates, put everything in a LinkedHashMap -- no duplicates, and you still get the order-of-insertion that ArrayList gives you.

But the real bottleneck is going to be sending the HTTP request to Google, waiting for the response, then parsing the response. Use a spellcheck library, instead.

Edit: But take my educated guesses with a grain of salt. Use a code profiler to see what's really slowing down your program.

Generally the best method is to figure out where your bottleneck is, and fix it. You'll usually find that you spend 90% of your time in a small portion of your code, and that's where you want to focus your efforts.

Once you've figured out what's taking a lot of time, focus on improving your algorithms. For example, removing duplicates from an ArrayList can be O(n²) complexity if you're using the most obvious algorithm, but that can be reduced to O(n) if you leverage the correct data structures.

Once you've figured out which portions of your code are taking the most time, and you can't figure out how best to fix it, I'd suggest narrowing down your question and posting another question here on StackOverflow.

Edit

As @oxbow_lakes so snidely put it, not all performance bottlenecks are to be found in the code's big-O characteristics. I certainly had no intention to imply that they were. Since the question was about "general methods" for optimizing, I tried to stick to general ideas rather than talking about this specific program. But here's how you can apply my advice to this specific program:

See where your bottleneck is. There are a number of ways to profile your code, ranging from high-end, expensive profiling software to really hacky. Chances are, any of these methods will indicate that your program spends the 99% of its time waiting for a response from Google.
Focus on algorithms. Right now your algorithm is (roughly):
1. Parse the XML
2. Create a list of words
3. For each word
  1. Ping Google for a spell check.
4. Return results

Since most of your time is spent in the "ping Google" phase, an obvious way to fix this would be to avoid doing that step more times than necessary. For example:

Parse the XML
Create a list of words
Send list of words to spelling service.
Parse results from spelling service.
Return results

Of course, in this case, the biggest speed boost would probably be by using spell checker that runs on the same machine, but that isn't always an option. For example, TinyMCE runs as a javascript program within the browser, and it can't afford to download the entire dictionary as part of the web page. So it packages up all the words into a distinct list and performs a single AJAX request to get a list of those words that aren't in the dictionary.

These folks are probably right, but a few random pauses will turn *probably" into "definitely, and here's why".