I have this very large dictionary file with 1 word on each line, and I would like to trim it down.
What I would like to do is leave 3-6 letter improper nouns, so it has to detect the words based on these:
- if the word is less than 3 letters, delete it
- if the word is more than 6 letters, delete it
- if the word has a capital letter, delete it
- if the word has a single quote or space, delete it.
I used this:
cat Downloads/en-US/en-US.dic | egrep '[a-z]{3,6}' > Downloads/3-6.txt
but the output is incorrect. It outputs the words with greater than 3 characters alright, but that's about my progress so far.
So how do I go about doing this in the mac terminal? There must be a way to do this right?
The following command will select only words that consist of exactly three to six lowercase a-z letters:
egrep '^[a-z]{3,6}$' /usr/share/dict/words > filtered.txt
Replace /usr/share/dict/words
with your input file, and filtered.txt
with a name for your output file. I just verified that this works on my Mac. Hope this helps!
Use grep
and write a regex rule to match the lines you want to keep. You can get info on grep by typing man grep
in the terminal.
精彩评论