I'm in a situation where I need to parse arguments from a string in the same way that they would be parsed if provided on the command-line to a Java/Clojure application.
For example, I need to turn "foo \"bar baz\" 'fooy barish' foo"
into ("foo" "bar baz" "fooy barish" "foo")
.
I'm curious if there is a way to use the parser that Ja开发者_开发知识库va or Clojure uses to do this. I'm not opposed to using a regex, but I suck at regexes, and I'd fail hard if I tried to write one for this.
Any ideas?
Updated with a new, even more convoluted version. This is officially ridiculous; the next iteration will use a proper parser (or c.c.monads and a little bit of Parsec-like logic on top of that). See the revision history on this answer for the original.
This convoluted bunch of functions seems to do the trick (not at my DRYest with this one, sorry!):
(defn initial-state [input]
{:expecting nil
:blocks (mapcat #(str/split % #"(?<=\s)|(?=\s)")
(str/split input #"(?<=(?:'|\"|\\))|(?=(?:'|\"|\\))"))
:arg-blocks []})
(defn arg-parser-step [s]
(if-let [bs (seq (:blocks s))]
(if-let [d (:expecting s)]
(loop [bs bs]
(cond (= (first bs) d)
[nil (-> s
(assoc-in [:expecting] nil)
(update-in [:blocks] next))]
(= (first bs) "\\")
[nil (-> s
(update-in [:blocks] nnext)
(update-in [:arg-blocks]
#(conj (pop %)
(conj (peek %) (second bs)))))]
:else
[nil (-> s
(update-in [:blocks] next)
(update-in [:arg-blocks]
#(conj (pop %) (conj (peek %) (first bs)))))]))
(cond (#{"\"" "'"} (first bs))
[nil (-> s
(assoc-in [:expecting] (first bs))
(update-in [:blocks] next)
(update-in [:arg-blocks] conj []))]
(str/blank? (first bs))
[nil (-> s (update-in [:blocks] next))]
:else
[nil (-> s
(update-in [:blocks] next)
(update-in [:arg-blocks] conj [(.trim (first bs))]))]))
[(->> (:arg-blocks s)
(map (partial apply str)))
nil]))
(defn split-args [input]
(loop [s (initial-state input)]
(let [[result new-s] (arg-parser-step s)]
(if result result (recur new-s)))))
Somewhat encouragingly, the following yields true
:
(= (split-args "asdf 'asdf \" asdf' \"asdf ' asdf\" asdf")
'("asdf" "asdf \" asdf" "asdf ' asdf" "asdf"))
So does this:
(= (split-args "asdf asdf ' asdf \" asdf ' \" foo bar ' baz \" \" foo bar \\\" baz \"")
'("asdf" "asdf" " asdf \" asdf " " foo bar ' baz " " foo bar \" baz "))
Hopefully this should trim regular arguments, but not ones surrounded with quotes, handle double and single quotes, including quoted double quotes inside unquoted double quotes (note that it currently treats quoted single quotes inside unquoted single quotes in the same way, which is apparently at variance with the *nix shell way... argh) etc. Note that it's basically a computation in an ad-hoc state monad, just written in a particularly ugly way and in a dire need of DRYing up. :-P
This bugged me, so I got it working in ANTLR. The grammar below should give you an idea of how to do it. It includes rudimentary support for backslash escape sequences.
Getting ANTLR working in Clojure is too much to write in this text box. I wrote a blog entry about it though.
grammar Cmd;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
DQ = '"';
SQ = '\'';
BS = '\\';
}
@lexer::members {
String strip(String s) {
return s.substring(1, s.length() - 1);
}
}
args: arg (sep! arg)* ;
arg : BAREARG
| DQARG
| SQARG
;
sep : WS+ ;
DQARG : DQ (BS . | ~(BS | DQ))+ DQ
{setText( strip(getText()) );};
SQARG : SQ (BS . | ~(BS | SQ))+ SQ
{setText( strip(getText()) );} ;
BAREARG: (BS . | ~(BS | WS | DQ | SQ))+ ;
WS : ( ' ' | '\t' | '\r' | '\n');
I ended up doing this:
(filter seq
(flatten
(map #(%1 %2)
(cycle [#(s/split % #" ") identity])
(s/split (read-line) #"(?<!\\)(?:'|\")"))))
I know this is a very old thread, but I came across this same problem and used java interop to call:
(CommandLineUtils/translateCommandline cmd-line)
from Plexus Common Utilities.
精彩评论