I'm doing a rather large PyPlot (Python matplotlib) (600000 values, each 32bit). Practically I guess I could simply do something like this:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
Two arrays, both allocated in memory. However I'll have to plot files, which contain several Gigabyte of t开发者_Python百科hose information sooner or later.
How do I avoid passing two arrays into the plt.plot()
?
I still need a complete plot however. So just an Iterator and passing the values line by line can't be done I suppose.
If you're talking about gigabytes of data, you might consider loading and plotting the data points in batches, then layering the image data of each rendered plot over the previous one. Here is a quick example, with comments inline:
import Image
import matplotlib.pyplot as plt
import numpy
N = 20
size = 4
x_data = y_data = range(N)
fig = plt.figure()
prev = None
for n in range(0, N, size):
# clear figure
plt.clf()
# set axes background transparent for plots n > 0
if n:
fig.patch.set_alpha(0.0)
axes = plt.axes()
axes.patch.set_alpha(0.0)
plt.axis([0, N, 0, N])
# here you'd read the next x/y values from disk into memory and plot
# them. simulated by grabbing batches from the arrays.
x = x_data[n:n+size]
y = y_data[n:n+size]
ax = plt.plot(x, y, 'ro')
del x, y
# render the points
plt.draw()
# now composite the current image over the previous image
w, h = fig.canvas.get_width_height()
buf = numpy.fromstring(fig.canvas.tostring_argb(), dtype=numpy.uint8)
buf.shape = (w, h, 4)
# roll alpha channel to create RGBA
buf = numpy.roll(buf, 3, axis=2)
w, h, _ = buf.shape
img = Image.fromstring("RGBA", (w, h), buf.tostring())
if prev:
# overlay current plot on previous one
prev.paste(img)
del prev
prev = img
# save the final image
prev.save('plot.png')
Output:
Do you actually need to plot individual points? It seems like a density plot would work just as well, with so many datapoints available. You might look into pylab's hexbin or numpy.histogram2d. For such large files, you'd probably have to use numpy.memmap, though, or work in batches, as @samplebias says.
精彩评论