Parsing SQL table definition using regular expression in Java_问答_开发者

Parsing SQL table definition using regular expression in Java

开发者 https://www.devze.com 2023-04-02 19:15 出处：网络

Im trying to parse a SQL table creation script in Java. Ive currently got the following pattern: Pattern p = Pattern.compile(\"(.+)([ ]+)(.+)([ ]+)(.+)\");

相关专题：regex

Im trying to parse a SQL table creation script in Java.

Ive currently got the following pattern:

Pattern p = Pattern.compile("(.+)([ ]+)(.+)([ ]+)(.+)");

i.e a group of any chars (column name), followed by one or more spaces, followed by another group of chars (column type), followed by one or more spaces, followed by any number of chars (i.,e not null etc).

And this is used by the following code:

Matcher m = p.matcher(field);
if(m.find()){
    String column = m.group(1).trim();
    String type = m.group(3).trim();
    String clauses = m.group(5).trim();
}

And yet when I run this on:

firstColumn         varchar(4)   not null,

The first group is:

firstColumn         varchar(4)

I would expect the three extracted fields to be firstColumn, varcha开发者_JAVA百科r(4) and not null respectively.

Any ideas?

(.+) will consume as much as possible. To make it consume as little as possible, change it to (.+?).

Try something like this:

String input = "firstColumn         varchar(4)   not null,";

Pattern p = Pattern.compile("(.+?)\\s+(.+?)\\s+(.*)");
Matcher m = p.matcher(input);

if (m.find()) {
    System.out.println(m.group(1));
    System.out.println(m.group(2));
    System.out.println(m.group(3));
}

Output:

firstColumn
varchar(4)
not null,

Another option (IMHO a preferable one) to lazy matching is specifying the characters that can happen in the word, that is:

([^ ]+)([ ]+)([^ ]+)([ ]+)(.+)([ ]+)

The difference is that while this approach will never put spaces in columnName, the lazy match still might if the rest of the pattern fails and backtracks.

BTW, the data type can still contain spaces (eg. CHAR(20) CHARACTER SET xxxxx, also the parens may be spaced), so this approach is not really going to work.