text-processing
Big text file processing
I need to implement lazy loading in Mathematica. I have a 600 Mb CSV text file which I need to process. This file contains a lot of duplicated records:[详细]
2023-01-27 06:22 分类:问答How to join each double lines?
I have a text file, a1 a2 b1 b2 c1 c2 ... I want to join by two lines so one can sort it: a1:a2 b1:b2 c开发者_开发技巧1:c2[详细]
2023-01-25 07:01 分类:问答Perl: With Text::CSV can I write out a hash ref?
I have a Perl script that reads in a CSV file, changes the columns names of the original,adds new ones (output CSV column names are stored in the array,header_line), adds new field values for each row[详细]
2023-01-24 12:41 分类:问答Python: translating/replacing in a string words that aren't the ones you want
Basically, I\'ve got a bunch of phrases and I\'m only 开发者_StackOverflowinterested in the ones that contain certain words. What I want to do is 1) find out if that word is there and if it is, 2) era[详细]
2023-01-23 10:37 分类:问答Perl: Update a field in each line of a CSV file
Say I h开发者_如何学Pythonave a CSV file with thousands of lines similar to this one below: 1,fred,smith,\"11, erewhon avenue\",\"XYZ Company, 101 the road\",\"020 123456\",UK[详细]
2023-01-23 07:03 分类:问答Python: How to loop through blocks of lines
How to go through blocks of lines separated by an empty line? The file looks like the following: ID: 1[详细]
2023-01-19 12:08 分类:问答Algorithm for text classification
I have millions of short (up to 30 words) documents which I need to split into several known categories. It\'s possible, that a document matches several of the categories (seldom, but possible). It\'[详细]
2023-01-19 10:11 分类:问答How can I trim the contents of a file in Perl?
I would like to remove contents of a file from a certain character to a certain character in the file in Perl. How do I do that using a script?[详细]
2023-01-18 11:03 分类:问答Parsing Random Web Pages
I need to parse a bunch of random pages and add them to a DB. I am thinking of using regular expressions but I was wondering if there are a开发者_运维问答ny \'special\' techniques (other than looking[详细]
2023-01-17 11:00 分类:问答Code for identifying programming language in a text file [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this[详细]
2023-01-13 21:52 分类:问答