Using PARSE on a PORT! value_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-24 15:05 出处：网络

I tried using PARSE on a PORT! and it does not work: >> parse open %test-data.r [to end] ** Script error: parse does not allow port! 开发者_Go百科for its input argument

I tried using PARSE on a PORT! and it does not work:

>> parse open %test-data.r [to end]  
** Script error: parse does not allow port! 开发者_Go百科for its input argument

Of course, it works if you read the data in:

>> parse read open %test-data.r [to end]  
== true

...but it seems it would be useful to be able to use PARSE on large files without first loading them into memory.

Is there a reason why PARSE couldn't work on a PORT! ... or is it merely not implemented yet?

the easy answer is no we can't...

The way parse works, it may need to roll-back to a prior part of the input string, which might in fact be the head of the complete input, when it meets the last character of the stream.

ports copy their data to a string buffer as they get their input from a port, so in fact, there is never any "prior" string for parse to roll-back to. its like quantum physics... just looking at it, its not there anymore.

But as you know in rebol... no isn't an answer. ;-)

This being said, there is a way to parse data from a port as its being grabbed, but its a bit more work.

what you do is use a buffer, and

APPEND buffer COPY/part connection amount

Depending on your data, amount could be 1 byte or 1kb, use what makes sense.

Once the new input is added to your buffer, parse it and add logic to know if you matched part of that buffer.

If something positively matched, you remove/part what matched from the buffer, and continue parsing until nothing parses.

you then repeat above until you reach the end of input.

I've used this in a real-time EDI tcp server which has an "always on" tcp port in order to break up a (potentially) continuous stream of input data, which actually piggy-backs messages end to end.

details

The best way to setup this system is to use /no-wait and loop until the port closes (you receive none instead of "").

Also make sure you have a way of checking for data integrity problems (like a skipped byte, or erroneous message) when you are parsing, otherwise, you will never reach the end.

In my system, when the buffer was beyond a specific size, I tried an alternate rule which skipped bytes until a pattern might be found further down the stream. If one was found, an error was logged, the partial message stored and a alert raised for sysadmin to sort out the message.

HTH !

I think that Maxim's answer is good enough. At this moment the parse on port is not implemented. I don't think it's impossible to implement it later, but we must solve other issues first.

Also as Maxim says, you can do it even now, but it very depends what exactly you want to do.

You can parse large files without need to read them completely to the memory, for sure. It's always good to know, what you expect to parse. For example all large files, like files for music and video, are divided into chunks, so you can just use copy|seek to get these chunks and parse them.

Or if you want to get just titles of multiple web pages, you can just read, let's say, first 1024 bytes and look for the title tag here, if it fails, read more bytes and try it again...

That's exactly what must be done to allow parse on port natively anyway.

And feel free to add a WISH in the CureCode database: http://curecode.org/rebol3/