December 18, 2008

Migrate data from WordPress to Drupal 6

Filed under: Tips — Tags: , — Webopius @ 11:27 am

I’m currently working on a project that involves migrating an existing news site from WordPress 2.2 to Drupal 6. It’s still a work in progress but I’m discovering things along the way that I thought would be useful to the Drupal community.

The existing WordPress site is actually working fine but has got to the point where it has gone way beyond a blog and I need to start making use of the scalability and features in Drupal – particularly taxonomy, CCK and views. Also, at the time of writing, it has over 770 news articles in WordPress so it’s a solid site to test migrating to Drupal 6.

The migration tool

Related Drupal resources

wp2drupal
wp2drupal for Drupal 6
Pathauto module

The tool I would recommend for handling the migration is wp2drupal. You’ll see the official Drupal link to this tool on the right. The first problem I encountered was that this is written for Drupal 4.7 only and you are expected to install 4.7, then perform a Drupal migration from 4.7 to 5 and then 6.

Well, this was no good. So, I searched on the net and found that DenRaf had actually done the work to get wp2drupal working on Drupal 6 (link on the right). Although I have to say that it would make everyone’s life a lot easier if all the versions of wp2drupal out there were in a single place under source control.

wp2drupal is a standard Drupal module, so you install it in the sites/all directory and then enable and run it from the Drupal administration console. If you copy across your WordPress wp-config file into the same module directory, wp2drupal will use your existing database settings.

Create a local version of your databases and Drupal and back them up

Because my Drupal install is running on another hosting account, I found that when I ran wp2drupal I got a connection error because my WordPress database rejected a remote connection. This actually was a good thing and forced me to consider what I was doing.

Although wp2drupal says it cannot affect your WordPress database, I would strongly recommend taking a backup of your live database, make a local copy running on your PC and run the WordPress to Drupal migration on your own machine locally. I use a Mac so I already have the excellent MAMP toolkit installed. This also allows me to run Drupal locally so that I can perform the whole migration without any risk to my live sites. It also allows me to change php settings which you may need to do and may be a problem with some hosting companies if you don’t run the migration locally.

Oh, and of course, backup everything!

UTF-8 character set issues in WordPress

The first time I successfully ran wp2drupal I hit a problem which I now know is very common for WordPress users who have upgraded from WordPress 2.1 or earlier. The wordpress posts came across into Drupal successfully but they contained odd characters – usually where an apostrophe or carriage return existed in the original wordpress post.

I checked the Drupal and WordPress databases and the content was exactly the same. So, after more digging the issue was clear. Prior to wordpress 2.2 the character set used was often a Swedish Latin1 character set whereas now it is UTF-8. Although WordPress 2.2 and higher handle this perfectly well, any migration out of WordPress into another system such as Drupal expose the problem.

Related WordPress resources

WordPress UTF-8 database conversion

Luckily, there’s a perfect fix out there. The UTF-8 Database Converter plugin (link on the right) is a WordPress plugin that converts an existing Latin1 based WordPress database to a full UTF-8 system. You should install and run this plugin on your WordPress database before migrating to Drupal and this should fix the odd character issue.

wp2drupal silently failing

The next problem I hit was that wp2drupal appeared to have worked but had only moved across about 400 of my 770+ WordPress posts into Drupal. I re-ran it and it would fail on a different post each time which seemed to indicate a memory issue rather than a problem with a specific post.

I looked at the PHP logs and sure enough, PHP was reporting that it was unable to allocate any more memory.

PHP info was showing that the standard memory limit was 16M, so I edited my php.ini file to set memory_limit to 64M.

Another re-run of wp2drupal and this time all of my WordPress posts came into Drupal perfectly.

Keep your post IDs consistent between WordPress and Drupal

Currently wp2drupal creates a new Drupal 6 node for every WordPress post. This node will have an ID which is different from the original WordPress post ID.

Keeping your WordPress post IDs consistent with your node IDs in Drupal and keeping most of your URLs the same during a site migration makes handling ‘old’ URLs and redirecting much, much simpler. Of course, this only works if you are migrating into a new Drupal 6 site and not if you are migrating into a Drupal site with existing data.

So, if you want to keep your WordPress Post IDs in Drupal 6, here’s what to do:

1. In the wp2drupal module, open the file ‘migrate_2.php’ for editing.
2. At approximately line 419, you’ll see the line node_save($node);
3. Immediately after this line, add the following code:

db_query('UPDATE {node} SET nid = %d WHERE nid = %d and vid=%d', $post['ID'], $node->nid, $node->vid);
db_query('UPDATE {node_revisions} SET nid = %d WHERE nid = %d and vid=%d', $post['ID'], $node->nid,$node->vid);
$node->nid = $post['ID'];

This should import all your posts into Drupal 6 and keep the node IDs consistent with the original post IDs. This can only be done on a clean Drupal install and of course, I can’t guarantee it will work for you.

Keep your URLs consistent between WordPress and Drupal

To keep your urls the same as your original wordpress site, just install the pathauto module and configure an automated alias to match your original WordPress site [Administer->Site Building->URL aliases then Automated Alias Settings then Node Path Settings]. On the site I’m building, my automated alias setting is: ‘news/[nid]/[title-raw]’.

If you change the automated alias settings after importing from WordPress, you will need to check the ‘Bulk generate aliases for nodes that are not aliased’ before you save the pathauto configuration.

There appears to be a bug (feature?) in Pathauto when the bulk generate option is checked. It will only generate about 50 new aliases each time the configuration is saved. I haven’t tested this, but the culprit appears to be in the file ‘pathauto_node.inc’ at about line 101:

$result = db_query_range($query, $pattern_types, 0, variable_get("pathauto_max_bulk_update", 50));

Notice the ’50’ in this line? I think increasing this value should fix the problem. If you prefer not to do this, just re-save the automated alias settings page with the bulk generate option checked and a further 50 aliases will be generated.

Stuck?

Finally, if you are trying this with your own site and are having problems, post a comment below and Webopius will do all we can to help. If you would like us to perform the migration for you, we would be happy to review your project.

  • Tags