开发者

How can tokenize this string in java?

开发者 https://www.devze.com 2022-12-21 07:03 出处:网络
How can I split these simple mathematical expressions into seperate strings? I know that I basically want开发者_C百科 to use the regular expression: \"[0-9]+|[*+-^()]\" but it appears String.split()

How can I split these simple mathematical expressions into seperate strings?

I know that I basically want开发者_C百科 to use the regular expression: "[0-9]+|[*+-^()]" but it appears String.split() won't work because it consumes the delimiter tokens as well.

I want it to split all integers: 0-9, and all operators *+-^().

So, 578+223-5^2

Will be split into:

578  
+  
223  
-  
5  
^  
2  

What is the best approach to do that?


You could use StringTokenizer(String str, String delim, boolean returnDelims), with the operators as delimiters. This way, at least get each token individually (including the delimiters). You could then determine what kind of token you're looking at.


Going at this laterally, and assuming your intention is ultimately to evaluate the String mathematically, you might be better off using the ScriptEngine

import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;

public class Evaluator {
private ScriptEngineManager sm = new ScriptEngineManager();
private ScriptEngine sEngine = sm.getEngineByName("js");

public double stringEval(String expr)
{
Object res = "";
        try {
           res = sEngine.eval(expr);
          }
         catch(ScriptException se) {
            se.printStackTrace();
        }
        return Double.parseDouble( res.toString());
}

}

Which you can then call as follows:

Evaluator evr = new Evaluator();  
String sTest = "+1+9*(2 * 5)";  
double dd = evr.stringEval(sTest);  
System.out.println(dd); 

I went down this road when working on evaluating Strings mathematically and it's not so much the operators that will kill you in regexps but complex nested bracketed expressions. Not reinventing the wheel is a) safer b) faster and c) means less complex and nested code to maintain.


This works for the sample string you posted:

String s = "578+223-5^2";
String[] tokens = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");

The regex is made up entirely of lookaheads and lookbehinds; it matches a position (not a character, but a "gap" between characters), that is either preceded by a digit and followed by a non-digit, or preceded by a non-digit and followed by a digit.

Be aware that regexes are not well suited to the task of parsing math expressions. In particular, regexes can't easily handle balanced delimiters like parentheses, especially if they can be nested. (Some regex flavors have extensions which make that sort of thing easier, but not Java's.)

Beyond this point, you'll want to process the string using more mundane methods like charAt() and substring() and Integer.parseInt(). Or, if this isn't a learning exercise, use an existing math expression parsing library.

EDIT: ...or eval() it as @Syzygy recommends.


You can't use String.split() for that, since whatever characters match the specified pattern are removed from the output.

If you're willing to require spaces between the tokens, you can do...

"578 + 223 - 5 ^ 2 ".split(" ");

which yields...

578
+
223
-
5
^
2


Here's a short Java program that tokenizes such strings. If you're looking for evaluation of expression I can (shamelessly) point you at this post: An Arithemetic Expressions Solver in 64 Lines

  import java.util.ArrayList;
  import java.util.List;

  public class Tokenizer {
     private String input;

     public Tokenizer(String input_) { input = input_.trim(); }

     private char peek(int i) {
        return i >= input.length() ? '\0' : input.charAt(i);
     }

     private String consume(String... arr) {
        for(String s : arr)
           if(input.startsWith(s))
              return consume(s.length());
        return null;
     }

     private String consume(int numChars) {
        String result = input.substring(0, numChars);
        input = input.substring(numChars).trim();
        return result;
     }

     private String literal() {
        for (int i = 0; true; ++i)
           if (!Character.isDigit(peek(i)))
              return consume(i);
     }

     public List<String> tokenize() {
        List<String> res = new ArrayList<String>();
        if(input.isEmpty())
           return res;

        while(true) {
           res.add(literal());
           if(input.isEmpty())
              return res;

           String s = consume("+", "-", "/", "*", "^");
           if(s == null)
              throw new RuntimeException("Syntax error " + input);
           res.add(s);
        }
     }

     public static void main(String[] args) {
        Tokenizer t = new Tokenizer("578+223-5^2");
        System.out.println(t.tokenize());
     }   
  }


You only put the delimiters in the split statement. Also, the - mean range and has to be escaped.

"578+223-5^2".split("[*+\\-^()]")


You need to escape the -. I believe the quantifiers (+ and *) lose their special meaning, as do parentheses in a character class. If it doesn't work, try escaping those as well.


Here is my tokenizer solution that allows for negative numbers (unary).

So far it has been doing everything I needed it to:

private static List<String> tokenize(String expression)
    {
        char c;
        List<String> tokens = new ArrayList<String>();
        String previousToken = null;
        int i = 0;
        while(i < expression.length())
        {
            c = expression.charAt(i);
            StringBuilder currentToken = new StringBuilder();

            if (c == ' ' || c == '\t') // Matched Whitespace - Skip Whitespace
            {
                i++;
            }
            else if (c == '-' && (previousToken == null || isOperator(previousToken)) && 
                    ((i+1) < expression.length() && Character.isDigit(expression.charAt((i+1))))) // Matched Negative Number - Add token to list
            {
                currentToken.append(expression.charAt(i));
                i++;
                while(i < expression.length() && Character.isDigit(expression.charAt(i)))
                {
                    currentToken.append(expression.charAt(i));
                    i++;
                }   
            }
            else if (Character.isDigit(c)) // Matched Number - Add to token list
            {
                while(i < expression.length() && Character.isDigit(expression.charAt(i)))
                {
                    currentToken.append(expression.charAt(i));
                    i++;
                }
            }
            else if (c == '+' || c == '*' || c == '/' || c == '^' || c == '-') // Matched Operator - Add to token list
            {
                currentToken.append(c);
                i++;
            }
            else // No Match - Invalid Token!
            {
                i++;
            }

            if (currentToken.length() > 0)
            {
                tokens.add(currentToken.toString());    
                previousToken = currentToken.toString();    
            }
        }   
        return tokens;
    }


You have to escape the "()" in Java, and the '-'

myString.split("[0-9]+|[\\*\\+\\-^\\(\\)]");

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号