Fixing gaps in streamed USB data_问答_开发者_运维开发者技术经验分享

We have a hardware system with some FPGA's and an FTDI USB controller. The hardware streams data over USB bulk transfer to the PC at around 5MB/s and the software is tasked with staying in sync, checking the CRC and writing the data to file.

The FTDI chip has a 'busy' pin which goes high while its waiting for the PC to do its business. There is a limited amount of buffering in the FTDI and elsewhere on the hardware.

The busy line is going high for longer than the hardware can buffer (50-100ms) so we are losing data. To save us from having to re-design the hardware I have been asked to 'fix' this issue!

I think my code is quick enough as we've had it running up to 15MB/s, so that leaves an IO bottleneck somewhere. Are we just expecting too much from the PC/OS?

Here is my data entry point. Occasionally we get a dropped bit or byte. If the checksum doesn't compute, I shift through u开发者_JAVA技巧ntil it does. byte[] data is nearly always 4k.

    void ftdi_OnData(byte[] data)
    {
        List<byte> buffer = new List<byte>(data.Length);
        int index = 0;

        while ((index + rawFile.Header.PacketLength + 1) < data.Length)
        {
            if (CheckSum.CRC16(data, index, rawFile.Header.PacketLength + 2)) // <- packet length + 2 for 16bit checksum
            {
                buffer.AddRange(data.SubArray<byte>(index, rawFile.Header.PacketLength));                 
                index += rawFile.Header.PacketLength + 2; // <- skip the two checksums, we dont want to save them...
            }
            else
            {
                index++; // shift through
            }
        }

        rawFile.AddData(buffer.ToArray(), 0, buffer.Count);
    }

Tip: do not write to a file.... queue.

Modern computers have multiple processors. If you want certain things as fast as possible, use multiple processors.

Have on thread deal with the USB data, check checksums etc. It queues (ONLY) the results to a thread safe queue.
Another thread reads data from the queue and writes it to a file, possibly buffered.

Finished ;)

100ms is a lot of time for decent operations. I have successfully managed around 250.000 IO data packets per second (financial data) using C# without a sweat.

Basically, make sure your IO threads do ONLY that and use your internal memory as buffer. Especially dealing with hardware on one end the thread doing that should ONLY do that, POSSIBLY if needed running in high priority.

To get good read throughput on Windows on USB, you generally need to have multiple asynchronous reads (or very large reads, which is often less convenient) queued onto the USB device stack. I'm not quite sure what the FTDI drivers / libraries do internally in this regard.

Traditionally I have written mechanisms with an array of OVERLAPPED strutures and an array of buffers, and kept shovelling them into ReadFile as soon as they're free. I was doing 40+MB/s reads on USB2 like this about 5-6 years ago, so modern PCs should certainly be able to cope.

It's very important that you (or your drivers/libraries) don't get into a "start a read, finish a read, deal with the data, start another read" cycle, because you'll find that the bus is idle for vast swathes of time. A USB analyser would show you if this was happening.

I agree with the others that you should get off the thread that the read is happening as soon as possible - don't block the FTDI event handler for any longer than at takes to put the buffer into another queue.

I'd preallocate a circular queue of buffers, pick the next free one and throw the received data into it, then complete the event handling as quickly as possible.

All that checksumming and concatenation with its attendant memory allocation, garbage collection, etc, can be done the other side of potentially 100s of MB of buffer time/space on the PC. At the moment you may well be effectively asking your FPGA/hardware buffer to accommodate the time taken for you to do all sorts of ponderous PC stuff which can be done much later.

I'm optimistic though - if you can really buffer 100ms of data on the hardware, you should be able to get this working reliably. I wish I could persuade all my clients to allow so much...

So what does your receiving code look like? Do you have a thread running at high priority responsible solely for capturing the data and passing it in memory to another thread in a non-blocking fashion? Do you run the process itself at an elevated priority?

Have you designed the rest of your code to avoid the more expensive 2nd gen garbage collections? How large are you buffers, are they on the large object heap? Do you reuse them efficiently?