开发者

Reading memory in correct order Need some help

开发者 https://www.devze.com 2023-03-20 07:23 出处:网络
We are sto开发者_运维问答ring some sort of records in memory location as follows ----------------------------------------------

We are sto开发者_运维问答ring some sort of records in memory location as follows

----------------------------------------------
|EventID | Timestamp | Variable Data | Length  |
----------------------------------------------

Lengths of these fields are as follows

EventID+ timestamp is 12 bytes Length Fields is 4 bytes , it indicates the length of data field.

Millions of such records are placed one after the other & I have a pointer pointing to the current index, so If I want to read all the records I go like this I read 4 bytes right to left & I fetch that particular record & doing this iteratively I read the complete memory space. But the problem with this method is that It reads records in the reverse order as compared to the order in which to they were entered.

I need to device a method which will allow me to read this memory records in the same order they were entered with minimal space complexity.


I have another great solution for you!

  1. Read your records in reverse order (end to beginning) and swap in-memory values for EventID and Length fields.
  2. When access rows, just keep in mind the new layout: Length | Timestamp | Data | EventID


As the variable length data section comes before the length, it will be impossible to read data starting with the beginning memory address. Assuming no changes can be made to architecture or storage, one possible option is to use your current system to build an index of the variable data lenghts. Then, once you reach the beginning of the data you would then read the records in the correct order - using the previous built index to determine variable data length.

However, you mention this dataset contains millions of records. Thus storing an index of all variable data lengths before processing may not be feasible. One such solution to this problem would be to index only every other entry, or every fourth, eight, etc... depending upon your specific requirements. Then you could start at each indexed record, work backwards temporarily saving the data lengths until you reach a record you havn't processed. Then work forward again using this saved data.

For example, let's say you index every 8 records your first pass. Then, you would start at record 8 and save the length of that record. Then go to 7, 6, 5, 4, 3, 2, 1. Now you've saved the next 8 lenghts. So process record 1, 2, 3, 4, 5, 6, 7, and 8. Now, you don't know the length of 9 - so jump to 16. Then record 16, 15, 14, .., 9 lengths. Then again as before process 9, 10, 11 ... 16. Now repeat.


Try to 'reverse' records order while fetching at first, and then make a second fetch using the same process (allocate same memory amount to reverse).

As the variable data has variable length, and the length value in last position, I see no way to get this fetching from left to right.


There is another way to find the end of a row with no additional memory.

  1. All EventID fall into definite range, and could be sequential
  2. All Timestamp have a definite range too (say, from 2009/09/09 through 2011/11/11)
  3. Length, EventID, and Timestamp are adjacent between two rows and have fixed length of 16 bytes in total (4 for length, 4 for eventID, and 8 for timestamp).

Under these considerations you could write a function that searches the end of a row, e.g.

byte* FindNextRow(byte* rowStart, byte* memBlockEnd,
                 DWORD minEventID, DWORD maxEventID,
                 QWORD minTimestamp, QWORD maxTimestamp)
{
  long bytesAvail = (long)(memBlockEnd - rowStart) - 4;
  byte* ptr = rowStart + 12; // move to 'data'

  for (long i = 0; i < bytesAvail; i++, ptr++) {

    long length = *(long*)(ptr);

    // check if this is the last row
    if (ptr + 4 == memBlockEnd)
      return memBlockEnd;

    // try to find candidate for 'length' field first
    if (rowStart + 12 != ptr - length)
      continue;

    // then check 'EventID' and 'Timestamp' for the next row
    DWORD eventID = *(DWORD*)(ptr + 4);
    if (eventID < minEventID || eventID > maxEventID)
      continue; // you might add additional check on a sequence: eventID + 1 == *(DWORD*)(rowStart);

    QWORD timestamp = *(QWORD*)(ptr + 8);
    if (timestamp < minTimestamp || timestamp > maxTimestamp)
      continue; // you might add additional check on a sequence: timestamp > *(QWORD*)(rowStart + 4);

    // this is the match
    return ptr + 4;
  }
}

WARNING: this will not guarantee the correctness, but you could try to find a workaround this way.


Is allocating one pointer (in a 32 bits machine, usually 4 bytes) per message acceptable to you?

If it is, you could, starting from the end:

  1. Read length at current position - 4
  2. Get the pointer to the 1st byte of event id with: current position - 4 - length - 12
  3. Resize the pointer array if needed
  4. Store that pointer in the array
  5. Repeat from 1

Of course, you would need to realloc() as the pointer array grows (no need to realloc every time, do it in chunks).

I am assuming you are treating them as a char array, so char pointer difference of contiguous elements (n and n-1) would give you the size of the entire message.

This wastes memory. I know you don't want to, but if you can't do like Opillect said, swapping EventID and Length fields because they have different sizes, this seems like a good way to do it.

0

精彩评论

暂无评论...
验证码 换一张
取 消