Ok..so I am doing a program on NLP. It uses function eliminateStopWords(). This function reads from a 2D array "sentTokens" (of detected tokens). In the code below, index i is sentence number, j is for each token in the ith sentence.
Now, what my eliminateStopWords() does is this:
it reads stop words from a text file and stores them in a TreeSet
reads tokens from sentTokens array and checks them for stop words. If they are collocations, then they should not be checked for stop words, they are just dumped into a finalTokens array. If they are not a collection, then they are individually checked for stop words and are added to finalTokens array only if they are not stop words.
The problem comes in the loop of this step 2. Here is some code of it: (I have marked // HERE at the location where the error actually occurs... it's near the end)
private void eliminateStopWords() {
try {
// Loading TreeSet for stopwords from the file.
stopWords = new TreeSet<String> ();
fin = new File("stopwords.txt");
fScan = new Scanner(fin);
while (fScan.hasNextLine())
stopWords.add(fScan.nextLine());
fScan.close();
/* Test code to print all read stopwords
iter2 = stopWords.iterator();
while (iter2.hasNext())
System.out.println(iter2.next()); */
int k=0,m=0; // additional indices for finalTokens array
System.out.println(NO_OF_SENTENCES);
newSentence: for(i=0; i < NO_OF_SENTENCES; i++)
{
System.out.println("i = " + i);
for (j=0; j < sentTokens[i].length; j+=2)
{
System.out.println("j = " + j);
// otherwsise, get two successive tokens
String currToken = sentTokens[i][j];
String nextToken = sentTokens[i][j+1];
System.out.println("i = " + i);
System.out.println(currToken + " " + nextToken);
if ( isCollocation(currToken, nextToken) ) {
// if the current and next tokens form a bigram collocation, they are not checked for stop words
// but are directly dumped into finalTokens array
finalTokens[k][m] = currToken; m++;
finalTokens[k][m] = nextToken; m++;
}
if ( !stopWords.contains(currToken) )
{ finalTokens[k][m] = currToken; m开发者_运维技巧++; }
if ( !stopWords.contains(nextToken) )
{ finalTokens[k][m] = nextToken; m++; }
// if current token is the last in the sentence, do not check for collocations, only check for stop words
// this is done to avoid ArrayIndexOutOfBounds Exception in sentences with odd number of tokens
// HERE
System.out.println("i = " + i);
if ( j==sentTokens[i].length - 2) {
String lastToken = sentTokens [i][++j];
if (!stopWords.contains(lastToken))
{ finalTokens[k][m] = lastToken; m++; }
// after analyzing last token, move to analyzing the next sentence
continue newSentence;
}
}
k++; // next sentence in finalTokens array
}
// Test code to print finalTokens array
for(i=0; i < NO_OF_SENTENCES; i++) {
for (j=0; j < finalTokens[i].length; j++)
System.out.print( finalTokens[i][j] + " " );
System.out.println();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
I have printed the indices i & j at the entry of their respective for loops...it all works fine for the first iteration of the loop, but when the loop is about to reach its end... I have printed again the value of 'i'. This time it comes out as 14.
- it starts the first iteration with 0...
- does not get manipulated anywhere in the loop...
- and just by the end of (only) first iteration, it prints the value as 14
I mean this is seriously the WEIRDEST error I have come across ever while working with Java. It throws up an ArrayIndexOutOfBoundsException just before the final if block. It's like MAGIC. You do nothing on the variable in the code, still the value changes. HOW CAN THIS HAPPEN?
You never declared i
or j
in your code, which leads me to believe that they are fields.
I'm pretty sure that some of your other methods re-use those variables and thus mess with your result. isCollocation
looks like a candidate for that.
The counters in for
loops should always be local variables, ideally declared inside the for
statement itself (for minimal scope). Everything else is just asking for trouble (as you see).
精彩评论