well i faced i lot of prob converting th开发者_StackOverflowe html data on page to pdf and to doc making sure images also appear in the converted article but failed
i understand that XML is something like a foundation
so is it?
and how to use it?
i mean any guide of how to generate the xml of the page and then to change its extension to the needed(pdf,doc)?
using vs08,asp.net,c#
The short answer is no.
If there was such a format, why wouldn't all applications use it in the first place?
A note on different formats
Almost all document applications understands plain text (but image applications, etc. does not). The problem with plain text is that it does not contain any formatting. No pictures, no font size, no margins, nothing except text. Here is also the root cause why there are many different formats, the formatting.Take HTML for example. HTML is good for flowing texts on web sites with a continuous block of text which is navigated by a scrollbar. No page breaks, can adapt to different column widths depending on screen size, etc. HTML is also very dynamic, pages can expand sections, replace content and react to user input.
On the contrary, take PDF. PDF is page oriented, fixed width and height of the pages. It is also targeted at viewing only. Text wrapping is fixed with explicit line breaks. (Copy the text from a PDF to a Word document and insert some text in the middle of a line, and the line breaking will be a real mess). PDF is emulating a printed page with margins and everything.
Somewhere in the middle is the Word document. Page oriented like PDF, but not as fixed in the shape as a PDF document, to support a nice editing experience. Sections of texts reflow nicely when text is inserted in the middle. It is quite flexible when editing, but the final result is as strict in form as PDF. When printing a Word document the printout will look exactly like it was on the screen.
XML
XML is a very general format, you can think of it as a format for formats. XML in itself does not say anything about the content, you need a separate description of how to interpret the XML for a given application. There exists specifications like DocBook that specifies how to describe a document in XML. But that is not an exact description of how the document will look. It separates content from layout. You need to apply a layout/template to generate a visible output format. From a DocBook XML you can generate PDF, HTML, etc.There is not given way of converting a given document format to XML, not even a given XML-format like DocBook. XML based formats can be used as a source format to generate different viewable format.
A note on conversion
The problem of converting different formats to each other comes from the different purposes and strengths of each format. One format is simple not suitable or even able to describe the properties of another format correctly. There is no general method of converting between formats, because formats like PDF does not reveal the document structure in a reusable way.How to publish to different formats
The key to success when publishing to different formats is to separate content from layout. You need to specify what text you have, how the structure is (headers, sections, etc), what images you have and how they relate to your sections of text. The text and structure description may be in XML, in a database or something else.Then you need a tool to generate each output format from a template using some kind of tool.
精彩评论