I've seen how to get a random line from a text file, but the method stated there (the accepted answer) is running horrendously slow. It runs very slowly on my 598KB text file, and still slow on my a version of that text file which has only one out of every 20 lines, at 20KB. I never get past the "a" section (it's a wordlist).
The original file has 64141 lines; the shortened one has 2138 lines. To generate these files, I took the Linux Mint 11 /usr/share/dict/ameri开发者_JS百科can-english
wordlist and used grep
to remove anything with uppercase or an apostrophe (grep -v [[:upper:]] | grep -v \'
).
The code I'm using is
String result = null;
final Random rand = new Random();
int n = 0;
for (final Scanner sc = new Scanner(wordList); sc.hasNext();) {
n++;
if (rand.nextInt(n) == 0) {
final String line = sc.nextLine();
boolean isOK = true;
for (final char c : line.toCharArray()) {
if (!(constraints.isAllowed(c))) {
isOK = false;
break;
}
}
if (isOK) {
result = line;
}
System.out.println(result);
}
}
return result;
which is slightly adapted from Itay's answer.
The object constraints
is a KeyboardConstraints
, which basically has the one method isAllowed(char)
:
public boolean isAllowed(final char key) {
if (allAllowed) {
return true;
} else {
return allowedKeys.contains(key);
}
}
where allowedKeys
and allAllowed
are provided in the constructor. The constraints
variable used here has "aeouhtns".toCharArray()
as its allowedKeys
with allAllowed
off.
Essentially, what I want the method to do is to pick a random word that satisfies the constraints (e.g. for these constraints, "outvote" would work, but not "worker", because "w" is not in "aeouhtns".toCharArray()
).
How can I do this?
You have a bug in your implementation. You should read the line before you choose a random number. Change this:
n++;
if (rand.nextInt(n) == 0) {
final String line = sc.nextLine();
To this (as in the original answer):
n++;
final String line = sc.nextLine();
if (rand.nextInt(n) == 0) {
You should also check the constraints before drawing a random number. If a line fails the constraints it should be ignored, something like this:
n++;
String line;
do {
if (!sc.hasNext()) { return result; }
line = sc.nextLine();
} while (!meetsConstraints(line));
if (rand.nextInt(n) == 0) {
result = line;
}
I would read in all the lines, save these somewhere and then select a random line from that. This takes a trivial amount of time because a single file of less than 1 MB is a trivial size these days.
public class Main {
public static void main(String... args) throws IOException {
long start = System.nanoTime();
RandomDict dict = RandomDict.load("/usr/share/dict/american-english");
final int count = 1000000;
for (int i = 0; i < count; i++)
dict.nextWord();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to load and find %,d random words.", time / 1e9, count);
}
}
class RandomDict {
public static final String[] NO_STRINGS = {};
final Random random = new Random();
final String[] words;
public RandomDict(String[] words) {
this.words = words;
}
public static RandomDict load(String filename) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(filename));
Set<String> words = new LinkedHashSet<String>();
try {
for (String line; (line = br.readLine()) != null; ) {
if (line.indexOf('\'') >= 0) continue;
words.add(line.toLowerCase());
}
} finally {
br.close();
}
return new RandomDict(words.toArray(NO_STRINGS));
}
public String nextWord() {
return words[random.nextInt(words.length)];
}
}
prints
Took 0.091 seconds to load and find 1,000,000 random words.
精彩评论