I'm trying to learn more about Python by implementing a k-Nearest Neighbor classifier. KNN works by labeling the new data based on what existing data its most similar to. So for a given table of data, you try to determine the 3 most similar points (if k = 3) and pick whatever label is more frequent. There's different ways you determine "similarity", some kind of distance function. So you can implement various distance functions (cosine distance, manhattan, euclidean, etc) and pick whichever you want.
I'm trying to make something that lets me swap distance functions in and out easily without doing cases, and the solution I have so far is to just store a list of references to methods. This is great, but it breaks on deepcopy and I want to figure out how to either fix my implementation or come up with a compromise between not needing to do cases and getting deepcopy to work.
Here's my paired down class
class DataTable:
def __init__(self, filename,TrueSymbol,FalseSymbol):
self.t = self.parseCSV(filename)
self.TrueSymbol = TrueSymbol
self.FalseSymbol = FalseSymbol
# This is the problem line of code
self.distList = [self.euclideanDistance,
self.manhattanDistance,
self.cosineDistance]
def nearestNeighbors(self,entry,k=None,distanceMetric=None):
"""
distanceMetrics you can choose from:
0 = euclideanDistance
1 = manhattanDistance
2 = cosineDistance
"""
if distanceMetric == None:
distanceFunction = self.euclideanDistance
else:
self.distList[distanceMetric]
# etc..
def euclideanDistance(self,entry):
pass
def manhattanDistance(self,entry):
pass
def cosineDistance(self,entry):
pass
# open up that csv
def parseCSV(self,filename):
pass
And here's the code that calls it import deepcopytestDS import copy
data = deepcopytestDS.DataTable("ionosphere.data","g","b")
deepCopy = copy.deepcopy(data) # crash.
Here's the callstack
>>> deepCopy = copy.deepcopy(data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/copy.py", line 162, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.6/copy.py", line 292, in _deepcopy_inst
state = deepcopy(state, memo)
File "/usr/lib/python2.6/copy.py", line 162, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.6/copy.py", line 255, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python2.6/copy.py", line 162, in deepcopy
y = copier(x, memo)
File "/usr/lib/python2.6/copy.py", line 228, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/usr/lib/python2.6/copy.py", line 189, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/lib/python2.6/copy开发者_如何学编程.py", line 323, in _reconstruct
y = callable(*args)
File "/usr/lib/python2.6/copy_reg.py", line 93, in __newobj__
return cls.__new__(cls, *args)
TypeError: instancemethod expected at least 2 arguments, got 0
What does this crash mean, and is there some way to make deepcopy work without getting rid of my shortcut for swapping distance functions?
It was a bug: http://bugs.python.org/issue1515
You can put this at the top of your file to make it work:
import copy
import types
def _deepcopy_method(x, memo):
return type(x)(x.im_func, copy.deepcopy(x.im_self, memo), x.im_class)
copy._deepcopy_dispatch[types.MethodType] = _deepcopy_method
精彩评论