I am trying to read a java file and modify it simultaneously. This is what I need to do : My file is of the format :
aaa
bbb
aaa
ccc
ddd
ddd
I need to read through the file and get the count of the # of occurrences and modify the duplicates to get the following file:
aaa - 2
bbb - 1
c开发者_Python百科cc - 1
ddd - 2
I tried using the RandomAccessFile
to do this, but couldn't do it. Can somebody help me out with the code for this one?
It's far easier if you don't do two things at the same time. The best way is to run through the entire file, count all the occurrences of each string in a hash and then write out all the results into another file. Then if you need to, move the new file over the old one.
You never want to read and write to the same file at the same time. Your offsets within the file will shift everytime you make a write and the read cursor will not keep track of that.
I'd do it this way: - Parse the original file and save all entries into a new file. Use fixed length data blocks to write entries to the new file (so, say your longest string is 10 bytes long, take 10 + x as block length, x is for the extra info you want to save along the entries. So the 10th entry in the file would be at byte position 10*(10+x)). You'd also have to know the number of entries to create the (so the file size would noOfEntries*blocklength, use a RandomAccesFile and setLength to set the this file length). - Now use quicksort algorithm to sort the entries in the file (my idea is to have a sorted file in the end which makes things far easier and faster finally. Hashing would theoretically work too, but you'd have to deal with rearranging duplicate entries then to have all duplicates grouped - not really a choice here). - Parse the file with the now sorted entries. Save a pointer to the entry of the first occurence of a entry. Increment the number of duplicates until there is a new entry. Change the first entry and add that additonal info you want to have there into a new "final result" file. Continue this way with all remaining entries in the sorted file.
Conclusions: I think this should be a reasonably fast and use reasonable amount of resources. However, it depends on the data you have. If you have a very large number of duplicates, quicksort performance will degrade. Also, if your longest data entry is way longer than the average, it will also waste file space.
If you have to, there are ways you can manipulate the same file and update the counters, without having to open another file or keep everything in memory. However, the simplest of the approaches would be very slow.
import java.util.*;
import java.io.*;
import java.util.*;
class WordFrequencyCountTest
{
public static void main( String args[])
{
System.out.println(" enter the file name");
Scanner sc = new Scanner(System.in);
String fname= sc.next();
File f1 = new File(fname);
if(!f1.exists())
{
System.out.println(" Source file doesnot exists");
System.exit(0);
}
else{
try{
FileReader fis = new FileReader(f1);
BufferedReader br = new BufferedReader(fis);
String str = "";
int count=0;
Map<String, Integer> map = new TreeMap<String, Integer>();
while((str = br.readLine()) != null )
{
String[] strArray = str.split("\\s");
count=1;
for(String token : strArray) // iteration of strArray []
{
if(map.get(token)!=null )
{
count=map.get(token);
count++;
map.put(token, count);
count=1;
}else{
map.put(token, count);
}
}
}
Set set=map.entrySet();
Iterator itr = set.iterator();
System.out.println("========");
while(itr.hasNext())
{
Map.Entry entry = (Map.Entry)itr.next();
System.out.println( entry.getKey()+ " "+entry.getValue());
}
fis.close();
}catch(Exception e){}
}
}
}
精彩评论