When I tried the examples of MaxentClassifier from http://nltk.googlecode.com/svn/trunk/doc/howto/classify.html, I got the error below:
Grad eval #0
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
classifier = MaxentClassifier.train(train)
File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 323, in train
gaussian_prior_sigma, **cutoffs)
File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 1456, in train_maxent_classifier_with_scipy
model.fit(algorithm=algorithm)
File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 1026, in fit
return model.fit(self, self.K, algorithm)
File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 226, in fit
callback=callback)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 636, in fmin_cg
gfk = myfprime(x0)
File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 176, in function_wrapper
return function(x, *args)
File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 420, in grad
G = self.expectations() - self.K
ValueError: operands could not be broadcast together with shapes (54) (12)
Python Code:
train = [(dict(a=1,b=1,c=1), 'y'),
(dict(a=1,b=1,c=1), 'x'),
(dict(a=1,b=1,c=0), 'y'),
(dict(a=0,b=1,c=1), 'x'),
(dict(a=0,b=1,c=1), 'y'),开发者_如何学Python
(dict(a=0,b=0,c=1), 'y'),
(dict(a=0,b=1,c=0), 'x'),
(dict(a=0,b=0,c=0), 'x')]
test = [(dict(a=1,b=0,c=1)), # unseen
(dict(a=1,b=0,c=0)), # unseen
(dict(a=0,b=1,c=1)), # seen 3 times, labels=y,y,x
(dict(a=0,b=1,c=0)) # seen 1 time, label=x
]
classifier = MaxentClassifier.train(train)
But I don't how to solve it. Help me, thanks!
It works if you set the algorithm:
>>> algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0]
>>> algorithm
'GIS'
>>> classifier = nltk.MaxentClassifier.train(train, algorithm)
==> Training (100 iterations)
Iteration Log Likelihood Accuracy
---------------------------------------
1 -0.69315 0.556
2 -0.65164 0.778
3 -0.62713 0.778
4 -0.61084 0.667
5 -0.59935 0.667
6 -0.59096 0.667
.................................
.................................
(Note you missed one line of the training corpus)
Edit: Several nltk algorithms fail, including 'CG'. The problem is probably the same as the one reported here. If this is the case, it probably will be solved in nltk next releases. You could also report a bug to nltk to help the developpers and yourself.
As the reported bug seems related with numpy broadcasting and outdated uses of numpy, maybe you could try with an older version of numpy
精彩评论