How to write these patterns?
1) [ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ] to/TO issue/VB [ new/JJ debt/N$ obligations/NNS ] of/IN [ any/DT kind/NN ] [ the/DT Treasury/NNP ] said/VBD...
how开发者_如何学C to get DT$, VBZ, RB, DT, NN... or the part between '/' and space.
2) This is tagsets for Brown database. Is there a pattern for all tags in this link: http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html
Can 1) and 2) be combined as one pattern?
We are new to regex, please help. Thank you very much.
edit: 1) We want to extract the part between / and space: for example: This is a section from a corpus with tag, we just want to extract the tag, not word/token. The tagset includes uppercase letters or uppercaseletters+$, as shown below. We want to get only tags. Are we making the question clear? The tag rule is:
uppercase letter or uppercase letters or uppercase letters + $
[ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ]...
How to have a pattern that only extract DT$, NN, VBZ, RB, DT, NN..
In other words, we should get part between / and space.
We are using a Tperlregex wrapper that support most functions and patterns. The reg may be sth like /\w+|$, but we do not know.
We do not know if we have made it clear.
I think you should use this: "/[A-Z]+\$?\ ". (without qoutes of course)
精彩评论