Hi all what I w开发者_如何转开发an't should be really simple for somebody here..I want to remove a row from a numpy array in a loop like:
for i in range(len(self.Finalweight)):
if self.Finalweight[i] >= self.cutoffOutliers:
"remove line[i from self.wData"
I'm trying to remove outliers from a dataset. My full code os the method is like:
def calculate_Outliers(self):
def calcWeight(Value):
pFinal = abs(Value - self.pMed)/ self.pDev_abs_Med
gradFinal = abs(gradient(Value) - self.gradMed) / self.gradDev_abs_Med
return pFinal * gradFinal
self.pMed = median(self.wData[:,self.yColum-1])
self.pDev_abs_Med = median(abs(self.wData[:,self.yColum-1] - self.pMed))
self.gradMed = median(gradient(self.wData[:,self.yColum-1]))
self.gradDev_abs_Med = median(abs(gradient(self.wData[:,self.yColum-1]) - self.gradMed))
self.workingData= self.wData[calcWeight(self.wData)<self.cutoffOutliers]
self.xData = self.workingData[:,self.xColum-1]
self.yData = self.workingData[:,self.yColum-1]
I'm getting the following error:
ile "bin/dmtools", line 201, in plot_gride self.calculate_Outliers() File "bin/dmtools", line 188, in calculate_Outliers self.workingData= self.wData[calcWeight(self.wData)>self.cutoffOutliers] ValueError: too many indices for array
There is actually a tool in NumPy specifically made to mask out outliers and invalid data points: masked arrays. Example from the linked page:
x = numpy.array([1, 2, 3, -1, 5])
mx = numpy.ma.masked_array(x, mask=[0, 0, 0, 1, 0])
print mx.mean()
prints
2.75
精彩评论