Book translation data format_问答_开发者_运维开发者技术经验分享

I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.

I could basically create a simple XML-based markup language, that'd look something like

<book>
  <chapter>
    <paragraph>
      <sentence>
        <original>This is an example sentence.</original>
        <translation lang="fi">Tämä on esimerkkilause.</translation>
      </sentence>
    </paragraph>
  </chapter>
</book>

Now, that would probably have its benefits but I don't think editing that would be very fun.

Another possibility that I can think of would be to keep the original and translation in separate files. If I 开发者_如何学编程add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.

original.txt:
  This is an example sentence.
  In this format editing is easy.

translation-fi.txt:
  Tämä on esimerkkilause.
  Tässä muodossa muokkaaminen on helppoa.

However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:

What would be the best data format for making a book translation with a text editor?

EDIT: added tag vim, since I'd prefer to do this with vim and believe that some vim guru might have ideas.

EDIT2: started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.

One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option scrollbind, cursorbind and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!

But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.

Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a >> or similar on every line. A translation of one chunk would begin by o. The file would look like this:

  >> This is an example sentence.
  Tämä on esimerkkilause.
  >> In this format editing is easy.
  Tässä muodossa muokkaaminen on helppoa.

And I would enhance the readability by doing a :match Comment /^>>.*$/ or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a :syn region that disables spell checking for the original text. Finally, as a detail, I'd bind <C-j> to do 2j and <C-k> to 2k to allow easy jumping between the parts that matter.

Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write <C-j/k> to jump between translations.

Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple grep will peel the original text off after you are done.

Why not use a simplified diff format?

it is linewise which is suitable for whole sentences.
The first character is significant (space, special, + or -)
It will be quite compact
Maybe you needn't those @@ parts
Vim will support it and color the English sentence and the Finnish sentence in distinct colors.

Assuming you want to keep the 1 - 1 relationship between the original text and the translated text, a database table makes the most sense.

You'd have one table with the following columns:

id - Integer - Autonum
original_text - Text - Not null
translated_text - Text - Nullable

You'd need a process to load the original text, and a process to show you one line of the original text and allow you to type the translated text. Perhaps the second process could show you 5 lines (2 before, the line you want to translate, and 2 after) to give you context.