We don’t allow q开发者_开发知识库uestions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this questionAnyone can recommend a decent Javascript parser for Java? I believe Rhino can be used, however it seems an overkill for just doing parsing, or is it the only decent solution? Any suggestion would be greatly appreciated. Thanks.
From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java
The grammar below is a context-free representation of the grammar this parser parses. It disagrees with EcmaScript 262 Edition 3 (ES3) where implementations disagree with ES3. The rules for semicolon insertion and the possible backtracking in expressions needed to properly handle backtracking are commented thoroughly in code, since semicolon insertion requires information from both the lexer and parser and is not determinable with finite lookahead.
Noteworthy features
- Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.
- Does not parse Firefox style
catch (<Identifier> if <Expression>)
since those don't work on IE and many other interpreters.- Recognizes
const
since many interpreters do (not IE) but warns.- Allows, but warns, on trailing commas in
Array
andObject
constructors.- Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.
To parse strict code, pass in a
PedanticWarningMessageQueue
that convertsMessageLevel#WARNING
and above toMessageLevel#FATAL_ERROR
.
CajaTestCase.js
shows how to set up a parser, and [fromResource
] and [fromString
] in the same class show how to get an input of the right kind.
When using Java V1.8, there is a trick you can use to parse with the Nashorn implementation that comes out the box. By looking at the unit tests in the OpenSDK source code, you can see how to use the parser only, without doing all the extra compilation etc...
Options options = new Options("nashorn");
options.set("anon.functions", true);
options.set("parse.only", true);
options.set("scripting", true);
ErrorManager errors = new ErrorManager();
Context context = new Context(options, errors, Thread.currentThread().getContextClassLoader());
Source source = new Source("test", "var a = 10; var b = a + 1;" +
"function someFunction() { return b + 1; } ");
Parser parser = new Parser(context.getEnv(), source, errors);
FunctionNode functionNode = parser.parse();
Block block = functionNode.getBody();
List<Statement> statements = block.getStatements();
Once this code runs, you will have the Abstract Syntax Tree (AST) for the 3 expressions in the 'statements' list.
This can then be interpreted or manipulated to your needs.
The previous example works with following imports:
import jdk.nashorn.internal.ir.Block;
import jdk.nashorn.internal.ir.FunctionNode;
import jdk.nashorn.internal.ir.Statement;
import jdk.nashorn.internal.parser.Parser;
import jdk.nashorn.internal.runtime.Context;
import jdk.nashorn.internal.runtime.ErrorManager;
import jdk.nashorn.internal.runtime.Source;
import jdk.nashorn.internal.runtime.options.Options;
You might need to add an access rule to make jdk/nashorn/internal/**
accessible.
In my context, I am using Java Script as an expression language for my own Domain Specific Language (DSL) which I will then compile to Java classes at runtime and use. The AST lets me generate appropriate Java code that captures the intent of the Java Script expressions.
Nashorn is available with Java SE 8.
The link to information about getting the Nashorn source code is here: https://wiki.openjdk.java.net/display/Nashorn/Building+Nashorn
A previous answer describes a way to get under the covers of JDK 8 to parse javascript. They are now mainlining it in Java 9. Nice!
This will mean that you don't need to include any libraries, instead we can rely on an official implementation from the java guys. Parsing javascript programmatically is much easier to achieve without stepping into taboo areas of java code.
Applications of this might be where you want to use javascript for a rules engine which gets parsed and compiled into some other language at runtime. The AST lets you 'understand' the logic as written in the the concise javascript language and then generate less pretty logic in some other language or framework for execution or evaluation.
http://openjdk.java.net/jeps/236
Summary from the link above:
Define a supported API for Nashorn's ECMAScript abstract syntax tree.
Goals
- Provide interface classes to represent Nashorn syntax-tree nodes.
- Provide a factory to create a configured parser instance, with configuration done by passing Nashorn command-line options via an API.
- Provide a visitor-pattern API to visit AST nodes.
- Provide sample/test programs to use the API.
Non-Goals
- The AST nodes will represent notions in the ECMAScript specification insofar as possible, but they will not be exactly the same. Wherever possible the javac tree API's interfaces will be adopted for ECMAScript.
- No external parser/tree standard or API will be used.
- There will be no script-level parser API. This is a Java API, although scripts can call into Java and therefore make use of this API.
Here are two ANTLR more or less working or complete (see comments on this post) grammars for EcmaScript:
- http://www.antlr.org/grammar/1206736738015/JavaScript.g (incomplete?)
- http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g (buggy?)
From ANTLR 5 minute intro:
ANTLR reads a language description file called a grammar and generates a number of source code files and other auxiliary files. Most uses of ANTLR generates at least one (and quite often both) of these tools:
A Lexer: This reads an input character or byte stream (i.e. characters, binary data, etc.), divides it into tokens using patterns you specify, and generates a token stream as output. It can also flag some tokens such as whitespace and comments as hidden using a protocol that ANTLR parsers automatically understand and respect.
A Parser: This reads a token stream (normally generated by a lexer), and matches phrases in your language via the rules (patterns) you specify, and typically performs some semantic action for each phrase (or sub-phrase) matched. Each match could invoke a custom action, write some text via StringTemplate, or generate an Abstract Syntax Tree for additional processing.
EcmaScript 5 Parser for the java https://github.com/DigiArea/es5-model
For me, the best solution is using acorn - https://github.com/marijnh/acorn under rhino.
I just don't think caja is getting attention anymore.
精彩评论