开发者

How to remove all specific elements from Vector

开发者 https://www.devze.com 2023-04-08 08:35 出处:网络
In fact, regarding to the title in the question, I have a solution for this, but my approach seems to waste resources to create a List objects.

In fact, regarding to the title in the question, I have a solution for this, but my approach seems to waste resources to create a List objects.

So my question is: Do we have a more efficient approach for this?

From the case, I want to remove the extra space " " and extra "a" from a Vector.

My vector includes:

{"a", "rainy", " ", "day", "with", " ", "a", "cold", "wind", "day", "a"}

Here is my code:

List lt = new LinkedList();
lt = new ArrayList();
lt.add("a");
lt.add(" ");
vec1.removeAll(lt);

As you can see the extra spaces in the list of Vector, the reason that happens is that I use Vector to read and chunk the word from word document, and sometimes the 开发者_如何学JAVAdocument may contain some extra spaces that caused by human error.


Your current approach does suffer the problem that deleting an element from a Vector is an O(N) operation ... and you are potentially doing this M times (5 in your example).

Assuming that you have multiple "stop words" and that you can change the data structures, here's a version that should (in theory) be more efficient:

    public List<String> removeStopWords(
            List<String> input, HashSet<String> stopWords) {
        List<String> output = new ArrayList<String>(input.size());
        for (String elem : input) {
            if (!stopWords.contains(elem)) {
                 output.append(elem);
            }
        }
        return res;
    }

    // This could be saved somewhere, assuming that you are always filtering
    // out the same stopwords.
    HashSet<String> stopWords = new HashSet<String>();
    stopWords.add(" ");
    stopWords.add("a");
    ... // and more

    List<String> newList = removeStopwords(list, stopWords);

Points of note:

  • The above creates a new list. If you have to reuse the existing list, clear it and then addAll the new list elements. (This another O(N-M) step ... so don't if you don't have to.)

  • If there are multiple stop words then using a HashSet will be more efficient; e.g. if done as above. I'm not sure exactly where the break even point is (versus using a List), but I suspect it is between 2 and 3 stopwords.

  • The above creates a new list, but it only copies N - M elements. By contrast, the removeAll algorithm when applied to a Vector could copy O(NM) elements.

  • Don't use a Vector unless you need a thread-safe data structure. An ArrayList has a similar internal data structure, and doesn't incur synchronization overheads on each call.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号