Reading in and Writing out_问答_开发者_运维开发者技术经验分享

I have written a program in Java to read in a text file of some metadata from images. They contain names and a long list of them sometimes over 4000 names. Unfortunately, many of these names are the same and so I wrote a program that takes the list in a .txt file and gets rid of the duplicates and outputs the new cleaned up and alphabetically sorted list to an output txt file.

Additionally, the program adds HTML list tags to each name so that I can copy paste them wherever I need to.

Example text file:

Chatty Little Kitty
Chatty Little Kitty
Bearly Nuf Taz
Got Lil Pepto

However, it seems to not be working properly as I still have duplicates in my output file. However, the code I wrote, to me seems correct which is why I am asking if there is an issue with how I'm setting up my reads and writes.

My Code:

 * This program takes in a text file that has a bunch of words listed. It then creates a single alphabetically
 * organized html list from that data. It also strips the data of dupblicates.
 */

import java.io.*;
import java.util.Arrays;

public class readItWriteIt
{   
       public static void main(String args[])
      {
        int MAX = 10000;
        String[] lines = new String[MAX];
        boolean valid = true;

        try{
        //Set up Input
        FileInputStream fstream = new FileInputStream("test.txt");
        Data开发者_Go百科InputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;


        //Set up Output
        FileWriter ostream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        //counters
        int count = 0;
        int second_count = 0;

        //start reading in lines from the file
        while ((strLine = br.readLine()) != null){   

        //check to make sure that there aren't duplicates. If a line is the same as another line 
        //set boolean valid to false else set to true.
        if((second_count++ > 0) && (count > 0)){
            for(int i=0; i < count; i++)
            {
                if(lines[i].equals(strLine)){
                    valid = false;
                }
                else
                {
                    valid = true;
                }
            }
        }


        //only copy the line to the local array if it is not a duplicate. Else do nothing with it.  
            if (valid == true){
                lines[count] = strLine.trim();
                count++;
            }
            else{}
          second_count++;
        }

        //create a second array so that you can get rid of all the null values. It is the size of the 
        //used length in the first array called "lines"
        String[] newlines = new String[count];

        //copy data from array lines to array called newlines
        for(int i = 0; i < count; i++){ 
            newlines[i] = lines[i];
        }

        //sort the array alphabetically
        Arrays.sort(newlines);

        //write it out to file in alphabetical order along with the list syntax for html
        for(int i = 0; i < count; i++)
        {
            out.write("<li>" + newlines[i] + "</li>");
            out.newLine();
        }

        //close I/O
        in.close();
        out.close();

        }catch (Exception e){//Catch exception if any
          System.err.println("Error: " + e.getMessage());
        }
      }
}

I wrote it like this

import java.util.HashSet;
import java.util.Set;
import java.io.*;
import java.util.Arrays;

public class converter {
    public static void main(String[] args) {

    try{
        //Set up Input
        FileInputStream fstream = new FileInputStream("test.txt");
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;

        //Set up Output
        FileWriter ostream = new FileWriter("out.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        Set lines = new HashSet();
        boolean result;

        while ((strLine = br.readLine()) != null){   
          result = lines.add(strLine.trim());
        }
        String[] newlines = new String[lines.size()];
        lines.toArray(newlines);

        Arrays.sort(newlines);

        //write it out to file in alphabetical order along with the list syntax for html
        for(int i = 0; i < lines.size(); i++)
        {
            out.write("<li>" + newlines[i] + "</li>");
            out.newLine();
        }

        out.close();
        in.close();

       }catch (Exception e){//Catch exception if any
                System.err.println("Error: " + e.getMessage());
       }
    }
}

But thanks to ewernli its now much more efficient.

If you add the lines into a Set (as the keys) rather than an Array you'll find you don't need to do any of the duplicate processing. It'll be taken care of for you and your program will be simpler and shorter.

Arrays are not the data structures you want here (do you need a data structure with a fixed length and ordering but with mutable elements?). Have a look at the collection types in java.util. In particular, look at the SortedSet implementations like TreeSet. This will:

Expand to hold the data
Eliminate duplicates (it is a Set)
Sort its contents as you add them (see Comparator implementations like String.CASE_INSENSITIVE_ORDER)

Actually your code needs some improvements, but what comes to me most wrong is to make comparison with not trimmed string then while putting it to lines array using trimmed string of fetched line.

lines[i].equals(strLine) // instead use "lines[i].equals(strLine.trim())"

Reading in and Writing out

精彩评论

关注公众号

热门标签

图文推荐

Reading in and Writing out

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：