text-processing
Problems with strtok()
I have been wrestling with this for a while. I know it\'s a lot of code to look at, but I have no idea where the problem lies and can\'t seem to narrow it down. I will bounty it.[详细]
2023-04-12 07:19 分类:问答In a *nix environment, how would I group columns together?
I have the following text file: A,B,C A,B,C A,B,C Is there a way, using standard *nix tools (cut, grep, awk, sed, etc), to process such a text file and get the following outp开发者_StackOverflowut:[详细]
2023-04-12 06:29 分类:问答counting number of words after last occurrence of every word in an array of strings
i am working on text. I want to find the number of words after the last occurrence of a particular word in an array of strings.For instance,[详细]
2023-04-08 16:32 分类:问答ANTLR for Writing JAPE Grammar
I am using GATE to process texts written in natural language. I have to extract height, weight, bp etc from the text and store it in structured form. Now, these things(i.e height, weight etc) can be w[详细]
2023-04-08 01:22 分类:问答Running a macro till the end of text file in Emacs
I have a text file with some sample content as shown here:开发者_JAVA百科 Sno = 1p Sno = 2p Sno = 3p[详细]
2023-04-07 00:19 分类:问答How to deal with unicode character encoding issues while converting documents from PDF to Text
I am trying to extract text from a PDF. The PDF contains text in Hindi (Unicode). The utility for e开发者_运维问答xtraction I am using is Apache PDFBox ( http://pdfbox.apache.org/). The extractor extr[详细]
2023-04-05 22:31 分类:问答Best way to parse a list of numbers
I have a problem in that I need to process a list of numbers, which will开发者_Go百科 be in an English sentence.It could be in the following formats:[详细]
2023-04-04 19:32 分类:问答Parse numbers from large text, possibly without regex (performance critical)
I\'m extremely familiar with regex before you all start answering with variations of: /d+ I want to know if there are alternatives to regex for parsing numbers out of a large text file.[详细]
2023-04-04 16:50 分类:问答How to classify text when pre defined categories are not available
I have a problem and not getting idea which algorithm have to apply. I am thinking to apply clusteringin case two but no idea on case one:[详细]
2023-04-04 15:47 分类:问答Multiline pattern matching
Problem: In a large file (plain text), there开发者_JAVA技巧 are some \"interesting\" lines which contain some specific words. The aim is to extract all those lines that contain such words. However, i[详细]
2023-04-03 21:34 分类:问答