开发者

How do I use Avro to process a stream that I cannot seek?

开发者 https://www.devze.com 2023-01-17 14:49 出处:网络
I am using Avro 1.4.0 to read some data out of S3 via the Python avro bindings and the boto S3 library. Whe开发者_如何学JAVAn I open an avro.datafile.DataFileReader on the file like objects returned b

I am using Avro 1.4.0 to read some data out of S3 via the Python avro bindings and the boto S3 library. Whe开发者_如何学JAVAn I open an avro.datafile.DataFileReader on the file like objects returned by boto it immediately fails when it tries to seek(). For now I am working around this by reading the S3 objects into temporary files.

I would like to be able to stream through any python object that supports read(). Can anybody provide advice?


I am not very clear on this and this may not be the answer. I was of the impression that

diter = datafile.DataFileReader(..) 

returns an iterator so that you could do the following

for data in diter:
    ....

Correct me, if I am wrong here.

Revisiting my answer:

You are right, datafile.DataFileReader does not play well with a reader for which seek would fail.

it uses avro.io.BinaryDecoder which accepts a reader.

class BinaryDecoder(object):
    """Read leaf values."""
    def __init__(self, reader):
        """
    reader is a Python object on which we can call read, seek, and tell.
    """
    self._reader = reader

What you can do is create your own reader class that does provide these functions - read , seek and tell but internally utilizes boto S3 library to read of data.

0

精彩评论

暂无评论...
验证码 换一张
取 消