I am building a php web app that requires internationalization. I have decided to use get-text for system related strings and perhaps some database tables for user generated content.
For example, a user might be able to post a blog post. He should be able to post different versions of that post in different languages. I can implement this by storing all the posts in the posts table with an extra column denoting the language.
The difficult bit is trying to internationalize system strings stored in the database.
For example, I have a table that stores privileges. Each privilege should have a string that describes what this privilege does.
At the moment, it is stored in a table like this:
app_privileges
- id
- privilege
- Some other columns
- Description
I plan to use an application like PoEdit to generate gettext files. It is able to search through all the php files to grab the strings. But in the cases like this where the string is stored in the database, it can be a fai开发者_运维百科r bit of work to extract the string for transation. What are some tricks and solutions to handling this?
Finally, lets say I have some data types and forms that users can create and define in the app. For example, defining a "product type" for a shopping cart. This means that the product will have its own unique set of attributes and descriptions. These attributes will require translating along with the description.
The same case with forms. The user can create a form which might be stored in a set of tables. These forms then need to be translated.
What are some database models I can use to store translations for forms and product types?
Cheers :)
For strings that are more "system"-oriented, like :
- buttons
- feature names
- privileges
- etc
Gettext is good.
Privileges, for instance, are more "system"-oriented (users don't create privileges, rather admins grant them to users but don't create new types of privileges). So, your privileges table can have a column "privilege_name" which is never displayed and contains only gettext keys, for instance : "Privilege : User can edit posts in the specified forum".
Strings in your application should also not be the text that the user will see, but something more detailed like "Menu: Edit Preferences".
Those strings go through gettext (even for english or the site's "mother tongue") and get translated to the proper user-visible strings.
You should also use numbered sprintf-style arguments, ie not "Price of %s is %s" but "Price of %(1)s is %(2)s".
This has several advantages :
- gettext provides very little context, and merges identical strings. A translator who sees "Button: Edit this post" will have a lot more help than a translator who sees "Edit". In some languages, a simple text "Edit" may translate into different words, or grammatical forms of the same verb, depending on what is "Edited".
- you can change the English text at will without breaking gettext keys for other languages
- numbered arguments handle languages with different ordering of subject/verb/etc
If you have text in images (like some buttons) you need to take care of this too. Gettext can translate filenames, too (images/buttons/en/submit.png => images/buttons/fr/valider.png although a simple regexp would be nice too) and don't forget the for blind people using screen readers.
For multilingual user-generated content (stored in tables), the usual relational approach is better.
Table posts( post_id ... ) Table posts_translated( post_id foreign key, language_id foreign key, title, text, etc )
This will allow you to use SQL for everything like showing the list of languages available for a post, allowing default languages, showing untranslated posts, fulltext search, etc.
Tricky question.
In my experience, the translation process is really hard to manage - and you don't want to hold up the release of your app because the Uzbek translation for the "Create posts" privilege hasn't been approved yet.
I've never worked with gettext, but the .Net and Java equivalents require you to put a file on the web server with translations; this is often treated as a deployment, and at the very least must go through the version control routines - this can also be a bit of a pain...
If you can get away with it, I'd agree on a convention of TableName_ColumnName for your gettext keys, and store all the user-visible system messages in gettext files.
I would argue that localization of system messages is a front-end concern, and shouldn't be in the database. Where your business entities - posts, shopping items etc. - can be localized, that's a domain issue and should be reflected in your database schema.
I think gettext is more appropriate to use for translation of messages, buttons, captions and so on. And not dynamic content. For such a purpose you can go several different ways depending on project requirements. First, you should decide:
- How flexible should the translation mechanism be. Do you need to add new translated tables frequently or update existing tables structure? Do you need to add new languages frequently?
- Do you need to track changes of original content and deactivate all translations when original was changed.
- Do you need a back-end for premoderation of translations (if there are several people involved in translation)
For a solution with maximum flexibility and extensibility you can go folowing way. Original tables are staying as is, with no changes. For translations you add a table translations
with columns:
id
primary keyobject_table
name of translated tableobject_id
reference to translated objectlanguage
orlanguage_id
if you need to manage languages dynamicallyfield
name of translated fieldoriginal_md5
hash of original field value (if you need to track changes)translation
author_id
,published
,date
and other fields required for moderation, if you need it.
Then, in another table or configuration file you describe what tables and fields are to be translated. You can also describe types of fields, e.g. text
, textarea
, html
, file
and so on.
For example:
$translatedFields = array(
// here key stands for translated table name
'posts' => array(
// here key stands for translated field
// and value for field's type
'title' => 'text',
'body' => 'html',
),
);
Then, in your data access layer you determine current language and substitute all SELECT queries to tables which are to be translated. A bit of regexp magic and strict syntax of your queries can help you here. The query SELECT title, body FROM posts WHERE posts.id='1'
turns into
SELECT
IF(posts_title_translation.translation IS NULL,
posts.title,
posts_title_translation.translation) AS title,
IF(posts_body_translation.translation IS NULL,
posts.body,
posts_body_translation.translation) AS body
FROM posts
LEFT JOIN translations AS posts_title_translation
ON posts_title_translation.object_id = posts.id
AND posts_title_translation.object_table = 'posts'
AND posts_title_translation.language = '$language'
AND posts_title_translation.field = 'title'
---- And if you need premoderation, then filter off unpublished translations
--AND posts_title_translation.published
LEFT JOIN translations AS posts_body_translation
ON posts_body_translation.object_id = posts.id
AND posts_body_translation.object_table = 'posts'
AND posts_body_translation.language = '$language'
AND posts_body_translation.field = 'body'
--AND posts_body_translation.published
WHERE posts.id = '1'
(The IF expression in SELECT section allows you to select original field when there is no translation of it done or published).
This is how you can have a flexible i18n system. Translations are made for each separate field and automatically substituted in data access layer while selecting.
It's a bit tricky, and, must say, partly influensed by Joom!Fish extension for Joomla!, but that's how I'd do this.
I'll add another answer for a simpler solution, because this is already too large.
For a simpler solution, but without updating structures of your tables you can store serialized arrays with translations in each translated field. For example:
id title
1 a:2:{s:7:"english";s:12:"Hello World!";s:7:"spanish";s:14:"¡Hola, mundo!";}
Then, when the record is loaded by id, you simply deserializing contents by calling unserialize
and select required translation. You can also use json_encode
and json_decode
for serialization and deserialization. It may be done in your model or base model class.
But when you have to support a lot of languages, it can be a problem since the size of all fields should be multiplied by a number of languages.
EDIT: But, as peufeu kindly pointed out, this solution breaks fulltext search and does not allow you to check existing translations with simple SQL query. So if you use fulltext search or need to select content by 'translated' flag, don't use it.
精彩评论