开发者

disk access using a b-tree indexer

开发者 https://www.devze.com 2023-03-18 02:06 出处:网络
iv\'e implemented a B+tree , my leaf nodes point to start of line(record) positions in a CSV file , my question is :

iv'e implemented a B+tree , my leaf nodes point to start of line(record) positions in a CSV file ,

my question is :

my tree is des开发者_StackOverflow中文版igned to except a tree - ORDER value i.e. ( The number of Pointers in each tree Node )

to my understanding the order value is suppose to optimize disk access by being able to read an entire disk block into memory in one disk access operation .

what i don't understand is how this comes into play , lets say i know the disk's block size i give the Order an appropriate value according to some calculation

for example : (order * sizeof( Record ) ) < block_size

Accessing the Data :

The pointer as i said holds a file path and an offset to the beginning of the line(Record)

  StreamReader reader ;
  reader.BaseStream.Position = leaf.Pointers[i].offset ; // leaf is a leaf node in the      tree 
  string record = reader.ReadLine();  

1) is the ReadLine() operation equivalent to one disk Access ? if so the way in witch i access my data would be the same (Disk Access wise not search wise ) ,would not be affected by the ORDER (size) of my tree nodes .

2) how would one change the disk access method to be optimized according to disk block size ?


1) is the ReadLine() operation equivalent to one disk Access ? if so the way in witch i access my data would be the same (Disk Access wise not search wise ) ,would not be affected by the ORDER (size) of my tree nodes .

Yes, your ReadLine() is a disk access; however, this doesn't really apply to you. You are essentially storing all data 'out of row'. A typical BTree or B+Tree would store the data directly inside tree structure not in a related file. You are a little light on details about the BTree mechanics itself so I can't tell you what the order/size would impact. Are you storing the BTree structure or just building an in-memory index? If it is stored then how are you saving/loading a node in the graph?

2) how would one change the disk access method to be optimized according to disk block size ?

The first part of this response might shed some light on it. Basically you would need to move away from using the related CSV file and store the data in the tree itself.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号