I have written a program in Java to read in a text file of some metadata from images. They contain names and a long list of them sometimes over 4000 names. Unfortunately, many of these names are the same and so I wrote a program that takes the list in a .txt
file and gets rid of the duplicates and outputs the new cleaned up and alphabetically sorted list to an output txt file.
Additionally, the program adds HTML list tags to each name so that I can copy paste them wherever I need to.
Example text file:
Chatty Little Kitty
Chatty Little Kitty
Bearly Nuf Taz
Got Lil Pepto
However, it seems to not be working properly as I still have duplicates in my output file. However, the code I wrote, to me seems correct which is why I am asking if there is an issue with how I'm setting up my reads and writes.
My Code:
* This program takes in a text file that has a bunch of words listed. It then creates a single alphabetically
* organized html list from that data. It also strips the data of dupblicates.
*/
import java.io.*;
import java.util.Arrays;
public class readItWriteIt
{
public static void main(String args[])
{
int MAX = 10000;
String[] lines = new String[MAX];
boolean valid = true;
try{
//Set up Input
FileInputStream fstream = new FileInputStream("test.txt");
Data开发者_Go百科InputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Set up Output
FileWriter ostream = new FileWriter("out.txt");
BufferedWriter out = new BufferedWriter(ostream);
//counters
int count = 0;
int second_count = 0;
//start reading in lines from the file
while ((strLine = br.readLine()) != null){
//check to make sure that there aren't duplicates. If a line is the same as another line
//set boolean valid to false else set to true.
if((second_count++ > 0) && (count > 0)){
for(int i=0; i < count; i++)
{
if(lines[i].equals(strLine)){
valid = false;
}
else
{
valid = true;
}
}
}
//only copy the line to the local array if it is not a duplicate. Else do nothing with it.
if (valid == true){
lines[count] = strLine.trim();
count++;
}
else{}
second_count++;
}
//create a second array so that you can get rid of all the null values. It is the size of the
//used length in the first array called "lines"
String[] newlines = new String[count];
//copy data from array lines to array called newlines
for(int i = 0; i < count; i++){
newlines[i] = lines[i];
}
//sort the array alphabetically
Arrays.sort(newlines);
//write it out to file in alphabetical order along with the list syntax for html
for(int i = 0; i < count; i++)
{
out.write("<li>" + newlines[i] + "</li>");
out.newLine();
}
//close I/O
in.close();
out.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I wrote it like this
import java.util.HashSet;
import java.util.Set;
import java.io.*;
import java.util.Arrays;
public class converter {
public static void main(String[] args) {
try{
//Set up Input
FileInputStream fstream = new FileInputStream("test.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Set up Output
FileWriter ostream = new FileWriter("out.txt");
BufferedWriter out = new BufferedWriter(ostream);
Set lines = new HashSet();
boolean result;
while ((strLine = br.readLine()) != null){
result = lines.add(strLine.trim());
}
String[] newlines = new String[lines.size()];
lines.toArray(newlines);
Arrays.sort(newlines);
//write it out to file in alphabetical order along with the list syntax for html
for(int i = 0; i < lines.size(); i++)
{
out.write("<li>" + newlines[i] + "</li>");
out.newLine();
}
out.close();
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
But thanks to ewernli its now much more efficient.
If you add the lines into a Set (as the keys) rather than an Array you'll find you don't need to do any of the duplicate processing. It'll be taken care of for you and your program will be simpler and shorter.
Arrays are not the data structures you want here (do you need a data structure with a fixed length and ordering but with mutable elements?). Have a look at the collection types in java.util. In particular, look at the SortedSet implementations like TreeSet. This will:
- Expand to hold the data
- Eliminate duplicates (it is a
Set
) - Sort its contents as you add them (see
Comparator
implementations like String.CASE_INSENSITIVE_ORDER)
Actually your code needs some improvements, but what comes to me most wrong is to make comparison with not trimmed string then while putting it to lines array using trimmed string of fetched line.
lines[i].equals(strLine) // instead use "lines[i].equals(strLine.trim())"
精彩评论