I am trying to create a script that loops through a list.
I need to look through a finite list (400) of competency identifiers (e.g. 124, 129 etc - normal ints )
I then have a dictionary that records what competencies each user has. The Key is the user name and the value for each key is a list of integers (i.e. which competencies the users have)
For example
User x - [124, 198, 2244 ...]
User Y - [129, 254, 198, 2244 ...]
I am looking to compile a matrix highlighting how often each competency occurs with every other competency - an adjacency matrix.
For example in the above examples competency 198 has occurred with competency 2244 on two occasions. Whereas competency 254 and 124 have never occurred together.
I am currently using this code:
fe = []
count = 0
competency_matches = 0
for comp in competencies_list:
common_competencies = str("")
for comp2 in competencies_list:
matches = int(0)
for person in listx:
if comp and comp2 in d1[person]:
matches = matches + 1
else:
matches = matches
common_competencies = str(common_competencies) + str(matches) + ","
fe.append(common_competencies)
print fe
print count
cou开发者_如何转开发nt = count + 1
This doesnt work and simply returns how many times each competency has occurred overall. I think the problem is with the "if comp and comp2 in d1[person]:" line.
The problem would be, for example, if a person had the following competencies [123, 1299, 1236] and I searched for competency 123, this would be returned twice due to this appearing in the 123 and 1236 entries. Does a way exist to force an EXACT match when using the if __ and __ then operation.
Or does anyone have an improve suggestion how to achieve this ...
Thanks in advance for any pointers. Cheers
You're misinterpreting how and
works. To test if two values are in a list, use:
if comp1 in d1[person] and comp2 in d1[person]:
...
Your version does something else. It binds like this: if (comp1) and (comp2 in d1[person])
. In other words, it interprets comp1
as a truth value, and then does a boolean and
with your list inclusion check. This is valid code, but it doesn't do what you want.
This should run quite a bit faster because it removes an extra layer of iteration. Hope it helps.
from collections import defaultdict
from itertools import combinations
def get_competencies():
return {
"User X": [124, 198, 2244],
"User Y": [129, 254, 198, 2244]
}
def get_adjacency_pairs(c):
pairs = defaultdict(lambda: defaultdict(int))
for items in c.itervalues():
items = set(items) # remove duplicates
for a,b in combinations(items, 2):
pairs[a][b] += 1
pairs[b][a] += 1
return pairs
def make_row(lst, fmt):
return ''.join(fmt(i) for i in lst)
def make_table(p, fmt="{0:>8}".format, nothing=''):
labels = list(p.iterkeys())
labels.sort()
return [
make_row([""] + labels, fmt)
] + [
make_row([a] + [p[a][b] if b in p[a] else nothing for b in labels], fmt)
for a in labels
]
def main():
c = get_competencies()
p = get_adjacency_pairs(c)
print('\n'.join(make_table(p)))
if __name__=="__main__":
main()
results in
124 129 198 254 2244
124 1 1
129 1 1 1
198 1 1 1 2
254 1 1 1
2244 1 1 2 1
... obviously a 400-column table is a bit much to print to screen; I suggest using csv.writer() to save it to a file which you can then work on in Excel or OpenOffice.
Your indentation here means that your two loops aren't nested. You first iterate through competencies_list
and set common_competencies
to the empty string 400 times, then iterate through competencies_list
again and do what phooji explained. I'm pretty sure that's not what you want to do.
精彩评论