ValueError occurs when I try to use CG algorithm of MaxentClassifier in nltk_问答_开发者

ValueError occurs when I try to use CG algorithm of MaxentClassifier in nltk

开发者 https://www.devze.com 2023-02-27 18:39 出处：网络

When I tried the examples of MaxentClassifier from http://nltk.googlecode.com/svn/trunk/doc/howto/classify.html, I got the error below:

Grad eval #0

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    classifier = MaxentClassifier.train(train)
  File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 323, in train
    gaussian_prior_sigma, **cutoffs)
  File "C:\Python27\lib\site-packages\nltk\classify\maxent.py", line 1456, in train_maxent_classifier_with_scipy
    model.fit(algorithm=algorithm)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 1026, in fit
    return model.fit(self, self.K, algorithm)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 226, in fit
    callback=callback)
  File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 636, in fmin_cg
    gfk = myfprime(x0)
  File "C:\Python27\lib\site-packages\scipy\optimize\optimize.py", line 176, in function_wrapper
    return function(x, *args)
  File "C:\Python27\lib\site-packages\scipy\maxentropy\maxentropy.py", line 420, in grad
    G = self.expectations() - self.K
ValueError: operands could not be broadcast together with shapes (54) (12)

Python Code:

train = [(dict(a=1,b=1,c=1), 'y'),
         (dict(a=1,b=1,c=1), 'x'),
         (dict(a=1,b=1,c=0), 'y'),
         (dict(a=0,b=1,c=1), 'x'),
         (dict(a=0,b=1,c=1), 'y'),开发者_如何学Python
         (dict(a=0,b=0,c=1), 'y'),
         (dict(a=0,b=1,c=0), 'x'),
         (dict(a=0,b=0,c=0), 'x')]
test = [(dict(a=1,b=0,c=1)), # unseen
        (dict(a=1,b=0,c=0)), # unseen
        (dict(a=0,b=1,c=1)), # seen 3 times, labels=y,y,x
        (dict(a=0,b=1,c=0)) # seen 1 time, label=x
        ]
classifier = MaxentClassifier.train(train)

But I don't how to solve it. Help me, thanks!

It works if you set the algorithm:

>>> algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0]
>>> algorithm
'GIS'
>>> classifier = nltk.MaxentClassifier.train(train, algorithm)

  ==> Training (100 iterations)

      Iteration    Log Likelihood    Accuracy
      ---------------------------------------
             1          -0.69315        0.556
             2          -0.65164        0.778
             3          -0.62713        0.778
             4          -0.61084        0.667
             5          -0.59935        0.667
             6          -0.59096        0.667
            .................................
            .................................

(Note you missed one line of the training corpus)

Edit: Several nltk algorithms fail, including 'CG'. The problem is probably the same as the one reported here. If this is the case, it probably will be solved in nltk next releases. You could also report a bug to nltk to help the developpers and yourself.

As the reported bug seems related with numpy broadcasting and outdated uses of numpy, maybe you could try with an older version of numpy