I have a text file with numbers in it as follows:
1231313123123123
1432423432535345
3532523452345345
1231423432453455
3434535345345345
3452353453253453
all the lines are the same length, I want to calculate entropy on each line and have output as:
2.64234234
2.65464564
2.35355435
etc.
Right now with this piece of code I have gives me entropy to be the same, what am I doing wrong?
Thanks.
#!/usr/bin/env python
import math
def H(data):
if not data:
return 0
entropy = 0
for x in range(256):
p_x = float(data.count(chr(x)))/len(data)
if p_x > 0:
entropy += - p_x*math.log(p_x, 2)
return entropy
failas = open('text.txt', 'r')
for row in failas:
pri开发者_StackOverflownt H('failas')
failas = open('text.txt', 'r')
for row in failas:
print H(row)
Perhaps you meant print H(row)
.
All of the above, plus you probably don't want to include the \n
at end of each line in the entropy calculation. Use H(row.rstrip('\n'))
You can answer a lot of your own questions by examining the data that is being tosssed around by your code. In this case, inserting print repr(data)
after the line def H(data):
would have shown you what the problem was straight away.
精彩评论