开发者

Find and replace text in a 47GB large file [closed]

开发者 https://www.devze.com 2023-03-25 05:53 出处:网络
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.

Closed 2 years ago.

Improve this question

I have to do some find and replace tasks on a rather big file , a开发者_StackOverflow中文版bout 47 GB in size .

Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .

I'm assuming this can be done via the commandline .

Do you have an idea how this can be accomplished ?


Sed (stream editor for filtering and transforming text) is your friend.

sed -i 's/old text/new text/g' file

Sed performs text transformations in a single pass.


I use FART - Find And Replace Text by Lionello Lunesu.

It works very well on Windows Seven x64.

You can find and replace the text using this command:

fart -c big_filename.txt "find_this_text" "replace_to_this"

github


On Unix or Mac:

sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt

fast and easy...


I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.


If you are using a Unix like system then you can use cat | sed to do this

cat hosted_domains.txt | sed s/com/net/g

Example replaces com with net in a list of domain names and then you can pipe the output to a file.


For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...

The solution is: create your own script in python, perl or even C++.

Or use the tool PowerGrep, this is the easiest and fastest option.

I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.


I used

sed 's/[nN]//g' oldfile.fasta > newfile.fasta

to replace all the instances of n's in my 7Gb file.

If I omitted the > newfile.fasta aspect it took ages as it scrolled up the screen showing me every line of the file.

With the > newfile it ran it in a matter of seconds on an ubuntu server

0

精彩评论

暂无评论...
验证码 换一张
取 消