开发者

How to migrate Drupal data to Django?

开发者 https://www.devze.com 2023-02-12 10:07 出处:网络
I want to migrate part of a Drupal 6 site to a Django application, specifically a Drupal based questions and answers section that I think would work better with OSQA. I\'ve already created another que

I want to migrate part of a Drupal 6 site to a Django application, specifically a Drupal based questions and answers section that I think would work better with OSQA. I've already created another question related to the authentication part of this integration and for the purposes of this quest开发者_Go百科ion we can assume that all Drupal users will be recreated, at least their usernames, in the Django database. This question is about the data migration from Drupal to Django.

In Drupal I have all questions as nodes of a 'question' content type with some CCK fields and the answers to these questions are standard comments. I need help to find the best way of moving this data to OSQA in Django.

At first I thought I could use South but I'm not sure if it would be the best fit for my needs.

For now I think my best approach would be to write a Django app that connects to the Drupal database, query for all the questions with their corresponding comments and users and then insert directly to Django's database using the correct models and Django methods.

Am I on the right path? Any other suggestions?

Thanks!


At first I thought I could use South but I'm not sure if it would be the best fit for my needs.

No, South is not for this kind of migration. It is for intra-project migrations, and you will want to have it, but it doesn't really do you any good here.

"Migration" is really not a good term for what you need. What you really want to do is export data from Drupal and import it into Django.

I haven't made an in-depth analysis of the possible solutions for this, but were I asked to do the same thing, I would simply define a JSON- or XML-based interchange format for the transfer, then write one set of code to export the data from Drupal to this format, then another to import data from this format into Django. I strongly recommend against using a binary format for this interchange; the ability to load the data into a text editor to verify your data and fix things is really important.

For now I think my best approach would be to write a Django app that connects to the Drupal database, query for all the questions with their corresponding comments and users and then insert directly to Django's database using the correct models and Django methods.

If you want to skip the interchange file and do it in one step, then you don't want to write a new Django app just for the import; that's (IMHO) overkill. What you want to write is a Django management command within the app that you will be importing data into, and you probably want to use Django's support for multiple databases as well as model properties (such as db_table and db_column) for using existing database schemas. This is why I recommend the interchange file method: you wouldn't need to reimplement Drupal tables in Django models.


Mike's answer is the good path to follow. However in real world scenario you can find useful to mix different techniques, for example connect to the original Drupal database for the files referencing a local directory for file content (query for files are simple join from few tables) but processing the most structured data via a custom JSON view (e.g. nodes).

In these case a JSON View created via Views Datasource module can help you to design and select your data via a simple Drupal view. Then you can write a management command to read and parse the data as suggested before. You have to page the view in a way that doesn't request too much to process and you can even do asynchronous requests to speed up the retrieval using gevent.

In this way I parsed more than 15k of contents in less than 10 minutes, not so fast but acceptable for one-time import. If you want to store content for process it later you can save raw data on a custom model on the database or on a on-memory redis data store via python redis integration. If you want some detail I've written a detailed howto for Drupal-Django migration deepening these techniques.

0

精彩评论

暂无评论...
验证码 换一张
取 消