开发者

pdf byte array read first line

开发者 https://www.devze.com 2023-01-24 05:40 出处:网络
I have a pdf byte array and was开发者_如何学运维 wondering if there was an easy way to read the first line of text into a variable?

I have a pdf byte array and was开发者_如何学运维 wondering if there was an easy way to read the first line of text into a variable?

Thanks, rod


Check out "SimpleTextParser" and the rest of the com.itextpdf.text.pdf.parser package (or whatever its called in C#-ville).

Note that "The First Line of Text" is a very slippery concept in PDF. Glyphs are drawn at specific coordinates. If a given clump of glyphs happens to share a base-line, they're visually on the same line. If a given shared baseline is the one closest to the top of the page, its the "first".

Oh, and the page might be rotated, throwing everything into a special kind of hell called "matrix math".

There's no particular requirement to write out text in PDF in a logical order. One could go through and write all the 'a's, then the 'b's, and so forth. Not bloody likely (or efficient), but perfectly legal. What IS likely is that all the text in a given font is drawn, followed by all the text in the next font, and so forth. If the first line of text happens to be in a couple different fonts (bold, italic, etc), you might find it harder than one would expect to locate the proper Line Of Text. A program might easily iterate through the fonts alphabetically, or store them in a hash map... don't depend on logical order to match "the order things are drawn". Sooner or later (probably sooner) you will be in for a rude shock.

I suggest you go read an iText FAQ or two. Your question betrays a level of ignorance that is easily cured with a little effort on your part. If nothing else, the freely available chapters from iText In Action (and its cornucopia of samples) should prove illuminating.


byte[] pdf;
BufferedReader in = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(pdf)));
String firstLine = in.readLine();
in.close();
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号