开发者

Parsing of data structure in a plain text file

开发者 https://www.devze.com 2022-12-19 11:34 出处:网络
How would you parse in Java a structure, similar to this \\\\Header (name)\\\\\\ 1JohnRide2MarySwanson 1 password1

How would you parse in Java a structure, similar to this


\\Header (name)\\\
1JohnRide  2MarySwanson
 1 password1
 2 password2
\\\1 block of data name\\\
  1.ABCD
  2.FEGH
  3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc

Suppose, it comes from a text buffer (plain file).

Each line of text is "\n" - limited. Space is used between the words.

The structure is more or less defined. Ambuguity may sometimes be, though, case number of fields in each line of information may be different, sometimes there may not be some block of data, and the number of lines in each block may vary as well.

The question is how to do it most effectively?

First solution that comes to my head is to use regular expressions.

But are there other solutions? P开发者_StackOverflow社区roblem-oriented? Maybe some java library already written?


Check out UTAH: https://github.com/sonalake/utah-parser

It's a tool that's pretty good at parsing this kind of semi structured text


As no one recommended any library, my suggestion would be : use REGEX.


From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).


This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).

First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?

After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.

If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).

Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).


If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.

E.g., the password can be gotten from your password line like this:

final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());
0

精彩评论

暂无评论...
验证码 换一张
取 消