开发者

finding unique values in a data file

开发者 https://www.devze.com 2023-03-25 05:41 出处:网络
I can do this in python but I was wondering if I could do this in Linux I have a file like this name1 text text 123432re text

I can do this in python but I was wondering if I could do this in Linux

I have a file like this

name1 text text 123432re text
name2 text text 12344qp开发者_开发技巧 text
name3 text text 134234ts text

I want to find all the different types of values in the 3rd column by a particular username lets say name 1.

grep name1 filename gives me all the lines, but there must be some way to just list all the different type of values? (I don't want to display duplicate values for the same username)


grep name1 filename | cut -d ' ' -f 4 | sort -u

This will find all lines that have name1, then get just the fourth column of data and show only unique values.


I tried using cat

File contains :(here file is foo.sh you can input any file name here)

$cat foo.sh

tar
world
class
zip
zip
zip
python
jin
jin
doo
doo

uniq will get each word only once

$ cat foo.sh | sort | uniq

class
doo
jin
python
tar
world
zip

uniq -u will get the word appeared only one time in file

$ cat foo.sh | sort | uniq -u

class
python
tar
world

uniq -d will get the only the duplicate words and print them only once

$ cat foo.sh | sort | uniq -d

doo
jin
zip


You can let sort look only on 4-th key, and then ask only for records with unique keys:

grep name1 | sort -k4 -u


As an all-in-one awk solution:

awk '$1 == "name1" && ! seen[$1" "$4]++ {print $4}' filename


IMHO Michał Šrajer got the best answer but a filename needed after grep name1 And i've got this fancy solution using indexed array

user=name1

IFSOLD=$IFS; IFS=$'\n'; test=( $(grep $user test) ); IFS=$IFSOLD
declare -A index
for item in "${test[@]}"; {
    sub=( $item )
    name=${sub[3]}
    index[$name]=$item
}

for item in "${index[@]}"; { echo $item; }


In my opinion, you need to select the field from which you need the unique values. I was trying to retrieve unique source IPs from IPTables log.

cat /var/log/iptables.log | grep "May  5" | awk '{print $11}' | sort -u

Here is the output of the above command:

SRC=192.168.10.225

SRC=192.168.10.29

SRC=192.168.20.125

SRC=192.168.20.147

SRC=192.168.20.155

SRC=192.168.20.183

SRC=192.168.20.194

So, the best idea is to select the field first and then filter out the unique data.


The following command worked for me.

sudo cat AirtelFeb.txt | awk '{print $3}' | sort -u

Here it prints the 3rd column with unique values.


I think you meant fourth column. You can try using 'cat Filename.txt | awk '{print $4}' | sort | uniq'

0

精彩评论

暂无评论...
验证码 换一张
取 消