开发者

How to write these patterns?

开发者 https://www.devze.com 2023-01-24 07:01 出处:网络
How to write these patterns? 1) [ the/DT$ government/NN ] has/VBZ n\'t/RB [ any/DT authority/NN ] to/TO issue/VB [ new/JJ debt/N$ obligations/NNS ] of/IN [ any/DT kind/NN ] [ the/DT Treasury/NNP ] sa

How to write these patterns?

1) [ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ] to/TO issue/VB [ new/JJ debt/N$ obligations/NNS ] of/IN [ any/DT kind/NN ] [ the/DT Treasury/NNP ] said/VBD...

how开发者_如何学C to get DT$, VBZ, RB, DT, NN... or the part between '/' and space.

2) This is tagsets for Brown database. Is there a pattern for all tags in this link: http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

Can 1) and 2) be combined as one pattern?

We are new to regex, please help. Thank you very much.

edit: 1) We want to extract the part between / and space: for example: This is a section from a corpus with tag, we just want to extract the tag, not word/token. The tagset includes uppercase letters or uppercaseletters+$, as shown below. We want to get only tags. Are we making the question clear? The tag rule is:

uppercase letter or uppercase letters or uppercase letters + $

[ the/DT$ government/NN ] has/VBZ n't/RB [ any/DT authority/NN ]...

How to have a pattern that only extract DT$, NN, VBZ, RB, DT, NN..

In other words, we should get part between / and space.

We are using a Tperlregex wrapper that support most functions and patterns. The reg may be sth like /\w+|$, but we do not know.

We do not know if we have made it clear.


I think you should use this: "/[A-Z]+\$?\ ". (without qoutes of course)

0

精彩评论

暂无评论...
验证码 换一张
取 消