I am curious to know how the Google Docs PDF viewer works? It's not a flash like scribd.com; 开发者_如何学Cit looks like pure HTML. Any idea how do they did it?
Sample link to view the PDF
Google is simply serving up an an image (right click -> save as), with an overlay to highlight text.
You should check out this SO question where others go into more detail.
You should also look through the source of your PDF link, it would appear Google are passing the PDF link through to be converted into an image.
Example:
<script type="text/javascript">
var gviewElement = document.getElementById('gview');
var config = {
'api': false,
'chrome': true,
'csi': true,
'ddUrl': "http://www.idfcmf.com/downloads/monthly_fund/2009/IDFC-Premier-Equityfund-jan10.pdf",
'element': gviewElement,
'embedded': false,
'initialQuery': "",
'oivUrl': "http://docs.google.com/viewer?url\x3dhttp%3A%2F%2Fwww.idfcmf.com%2Fdownloads%2Fmonthly_fund%2F2009%2FIDFC-Premier-Equityfund-jan10.pdf",
'sdm': 200,
'userAuthenticated': true
};
var gviewApp = _createGView(config);
gviewApp.setProgress(50);
window.jstiming.load.name = 'view';
window.jstiming.load.tick('_dt');
</script>
Edit
Also if you were to view the PDF viewer in Firefox with Firebug, you will notice that when you 'highlight' text it's really only enabling a load of divs, I'm guessing Google scans the document using OCR, detects where the text is and provides a matrix of coordinates on which to base the div placement on, when you click and drag it introgates the mouse pointer location to determine which divs to display.
the whole thing is an image. text highlight overlay - thats easy to figure out. but when you press ctrl+c and it copies to the clipboard, that part has me totally stumped. because it's not possible to write to the clipboard using javascript in firefox, but this ctrl+c on the image works fine in firefox. http://www.google.com/support/forum/p/Google+Docs/thread?tid=67dcf21ef8579b4c&hl=en&fid=67dcf21ef8579b4c00047e4a2a9fcb12
I agree with some of the other answers - the PDF is rendered as a PNG, and very likely the text areas are layered, probably using absolute/relative positioning. You can extract PDF information from the PDF (of course...). The PDF format is open - anyone could do it (granted, it might not be easy). However there are some open source tools out there (xPDF...) that enables export of PDF contents, like to XML. It's possible that the exports include information like coordinates as to where on the page text and images should display.
精彩评论