tokenize
Python: Tokenizing with phrases
I have blocks of text I want to tokenize, but I don\'t want to tokenize on whitespace and punctuation, as seems to be the standard with tools like NLTK. There are particular phrases that I want to be[详细]
2023-02-21 15:32 分类:问答Programming a simple compiler
I am writing a compiler for a simple language. I made a lexer/tokenizer that takes a file and prints the tokens in stdout.[详细]
2023-02-19 20:44 分类:问答string tokenizer in c++
Hello I have a code that is char * cip = \"192.168.0.1\\t\\t78.90.5开发者_如何学Python6.4\"; I want to convert it to[详细]
2023-02-19 20:20 分类:问答What is a HtmlTokenizer?
What does a HtmlTokenizer really do? What is its utility ? How can开发者_开发百科 I use it in a C# application ?It converts HTML elements to tokens, like this:[详细]
2023-02-18 23:32 分类:问答Lucene standard analyzer split on period
How do I make Lucene\'s Standard A开发者_运维问答nalyzer tokenize on the\'.\' char? For eg., on querying for \"B\" I need it to return the B in \"A.B.C\" as the result. I need to treat numbers the wa[详细]
2023-02-16 13:12 分类:问答How do I loop over several files, keeping the base name for further processing?
I have multiple text files that need to be tokenised, POS and NER. I am using C&C taggers and have run their tutorial, but I am wondering if there is a way to tag multiple files rather than one by[详细]
2023-02-13 22:50 分类:问答Web server - how to parse requests? Asynchronous Stream Tokenizer?
I\'m attempting to create a simple webserver in C# in asynchronous socket programming style.The purpose is very narrow - a Comet server (http long-polling).[详细]
2023-02-13 13:37 分类:问答Bi-directional Text Parsing Recommendations
I\'m looking at the feasability of implementing a bi-directional text parsing framework to allow formatted text to be processed using a combination of common paradigms su开发者_如何学JAVAch as Markdow[详细]
2023-02-13 10:44 分类:问答Generating PHP code (from Parser Tokens)
Is there any available solution for 开发者_如何学运维(re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably[详细]
2023-02-12 04:17 分类:问答Tokenizing with Perl and Unstructured data
I have the following data (from a text file), I would like to split / get each element, and even those element that are blanks (some grades as you can see are not listed, which means they are 0, so I[详细]
2023-02-10 09:03 分类:问答