Intro:
I have a bottleneck in my C# application where I need to load a page as a bitmap from a PDF or Tiff file and process this bitmap while in memory. Tiff files load fairly fast, as well as first-party PDFs (we can read our own). The bottleneck comes in when the PDF file is 开发者_运维技巧third-party and we need to parse the PDF page and turn it into bitmap. This is costly, 500 times slower than first-party PDFs to get an idea. Some of these PDF files get very large, so we avoid loading the whole document into memory first.
Hypothesis:
The work being done on the page is done in a seperate process (magically) while my application waits for it to be done. It is because of this I believe if I load a small buffer (say 5 pages at a time) Asynchronously it will speed up the execution of these third-party PDF files.
Psuedo (C#-ish):
IntPtr[] dibbuffer = new IntPtr[5];
dibbuffer[0] = LoadPage(0); //pre-emptive first page
BeginAsyncFillBuffer(dibbuffer);
for (i=0; i<NUM_PAGES; ++i)
{
IntenseProcessing(dibbuffer[current_page_index_in_buffer]);
}
EndAsyncFillBuffer();
Problems:
- Will this really speed up the application? (some of the machines it will be running on are single core)
- Is this worth the hastle of trying to synchronize and sort the buffer on the processing thread?
- Any tips for synchronizing the process are welcome. I am using C# so any .Net conventions or data-structures can be used.
- Adendum: I would like it to be as lazy as possible (only load next page when there is room free in the buffer
This is what I ended up with. I wish instead of polling every X milliseconds it was more "lazy" and only fills the buffer on the seperate thread when needed. If anyone can refine this please do.
class MyGhettoBuffer
{
Target _target = null; //contains info on the file @ hand
Queue _q = null;
Queue _synchQ = null;
Thread _loop = null;
ManualResetEvent _throttle = new ManualResetEvent(false);
int _curpage = 0;
private MyGhettoBuffer() { }
public MyGhettoBuffer(Target target)
{
_target = target;
_q = new Queue();
_synchQ = Queue.Synchronized(_q);
_loop = new Thread(MainLoop);
_loop.Start();
}
public bool HasPagesLeft //determine when to stop processing queue
{
get
{
if (_curpage >= _target.NumPages &&
_synchQ.Count == 0)
return false;
else
return true;
}
}
//if the buffer hasnt caught up load the page on the processing thread
public IntPtr GetNextPage()
{
lock (this)
{
if (_synchQ.Count == 0)
{
IntPtr dib =
LoadDib(_target.FullPath, _curpage);
_curpage++;
return dib;
}
else
{
object o = _synchQ.Dequeue();
if (o is IntPtr)
{
return (IntPtr)o;
}
else
{
throw new InvalidCastException("Object in page queue is not an IntPtr");
}
}
}
}
private void MainLoop()
{
while (true)
{
if (_curpage < _target.NumPages)
{
if (_synchQ.Count < 5)
{
lock (this)
{
IntPtr dib =
LoadDib(_target.FullPath, _curpage);
_synchQ.Enqueue(dib);
_curpage++;
}
}
}
else
{
return;
}
_throttle.WaitOne(100, false); //dont use a %@#! ton of cpu cycles
}
}
}
then, in my processing thread I do something like this:
MyGhettoBuffer buffer = new MyGhettoBuffer(target);
while (buffer.HasPagesLeft)
{
IntPtr dib = GetNextPage();
//Process the dib here
FreeDib(dib);
}
精彩评论