I need to parse a ra开发者_如何学编程w text file having a item for each line, and tab-delimited fields.
How can I detect a tab space and next-line markup symbols from a plain text document ? I was thinking to use Java APIs for it... but if you know any faster language and easy to use) for text parsing please let me know
thanks
String str = "Hello\tworld\nHello Universe";
System.out.println(str);
System.out.println(str.contains("\t"));
System.out.println(str.indexOf("\t"));
System.out.println(str.contains("\n"));
System.out.println(str.indexOf("\n"));
Output:
Hello world
Hello Universe
true
5
true
11
You can try this
try
{
BufferedReader br = new BufferedReader(new FileReader(file1));
String strLine = "";
while (br.readLine() != null)
{
strLine =br.readLine();
Scanner str = new Scanner(strLine);
str.useDelimiter("\t");
while(str.hasNextToken)
{
}
}
} catch (Exception e)
{
}
You can use the Guava librairy from Google
Have a look to the CharMatcher and Guava's slides
This is an exemple :
@Test
public void testGuavaMatcher(){
String str = "Hello\tworld\nHello Universe";
CharMatcher tabMatcher = CharMatcher.is('\t');
CharMatcher newLineMatcher = CharMatcher.is('\n');
assertThat(tabMatcher.indexIn(str), is(5));
assertThat(tabMatcher.matchesAnyOf(str), is(true));
assertThat(newLineMatcher.indexIn(str), is(11));
assertThat(newLineMatcher.matchesAnyOf(str), is(true));
CharMatcher tabAndNewLineMatcher = tabMatcher.or(newLineMatcher);
assertThat(tabAndNewLineMatcher.removeFrom(str), is("HelloworldHello Universe"));
}
You can also have a look to the CharMatcher.BREAKING_WHITESPACE constant.
Text files do not have 'mark up' as such. Get each line using BufferedReader.readLine(). Tabs can be found by searching the lines for "\t".
精彩评论