开发者

Edit large textfile in mac terminal

开发者 https://www.devze.com 2023-03-23 10:46 出处:网络
I have this very large dictionary file with 1 word on each line, and I would like to trim it down. What I would like to do is leave 3-6 letter improper nouns, so it has to detect the words based on t

I have this very large dictionary file with 1 word on each line, and I would like to trim it down.

What I would like to do is leave 3-6 letter improper nouns, so it has to detect the words based on these:

  1. if the word is less than 3 letters, delete it
  2. if the word is more than 6 letters, delete it
  3. if the word has a capital letter, delete it
  4. if the word has a single quote or space, delete it.
开发者_如何转开发

I used this:

cat Downloads/en-US/en-US.dic | egrep '[a-z]{3,6}' > Downloads/3-6.txt

but the output is incorrect. It outputs the words with greater than 3 characters alright, but that's about my progress so far.

So how do I go about doing this in the mac terminal? There must be a way to do this right?


The following command will select only words that consist of exactly three to six lowercase a-z letters:

egrep '^[a-z]{3,6}$' /usr/share/dict/words > filtered.txt

Replace /usr/share/dict/words with your input file, and filtered.txt with a name for your output file. I just verified that this works on my Mac. Hope this helps!


Use grep and write a regex rule to match the lines you want to keep. You can get info on grep by typing man grep in the terminal.

0

精彩评论

暂无评论...
验证码 换一张
取 消