Migrate data from WordPress to Drupal 6
I’m currently working on a project that involves migrating an existing news site from WordPress 2.2 to Drupal 6. It’s still a work in progress but I’m discovering things along the way that I thought would be useful to the Drupal community.
The existing WordPress site is actually working fine but has got to the point where it has gone way beyond a blog and I need to start making use of the scalability and features in Drupal – particularly taxonomy, CCK and views. Also, at the time of writing, it has over 770 news articles in WordPress so it’s a solid site to test migrating to Drupal 6.
The migration tool
Related Drupal resources
The tool I would recommend for handling the migration is wp2drupal. You’ll see the official Drupal link to this tool on the right. The first problem I encountered was that this is written for Drupal 4.7 only and you are expected to install 4.7, then perform a Drupal migration from 4.7 to 5 and then 6.
Well, this was no good. So, I searched on the net and found that DenRaf had actually done the work to get wp2drupal working on Drupal 6 (link on the right). Although I have to say that it would make everyone’s life a lot easier if all the versions of wp2drupal out there were in a single place under source control.
wp2drupal is a standard Drupal module, so you install it in the sites/all directory and then enable and run it from the Drupal administration console. If you copy across your WordPress wp-config file into the same module directory, wp2drupal will use your existing database settings.
Create a local version of your databases and Drupal and back them up
Because my Drupal install is running on another hosting account, I found that when I ran wp2drupal I got a connection error because my WordPress database rejected a remote connection. This actually was a good thing and forced me to consider what I was doing.
Although wp2drupal says it cannot affect your WordPress database, I would strongly recommend taking a backup of your live database, make a local copy running on your PC and run the WordPress to Drupal migration on your own machine locally. I use a Mac so I already have the excellent MAMP toolkit installed. This also allows me to run Drupal locally so that I can perform the whole migration without any risk to my live sites. It also allows me to change php settings which you may need to do and may be a problem with some hosting companies if you don’t run the migration locally.
Oh, and of course, backup everything!
UTF-8 character set issues in WordPress
The first time I successfully ran wp2drupal I hit a problem which I now know is very common for WordPress users who have upgraded from WordPress 2.1 or earlier. The wordpress posts came across into Drupal successfully but they contained odd characters – usually where an apostrophe or carriage return existed in the original wordpress post.
I checked the Drupal and WordPress databases and the content was exactly the same. So, after more digging the issue was clear. Prior to wordpress 2.2 the character set used was often a Swedish Latin1 character set whereas now it is UTF-8. Although WordPress 2.2 and higher handle this perfectly well, any migration out of WordPress into another system such as Drupal expose the problem.
Related WordPress resources
Luckily, there’s a perfect fix out there. The UTF-8 Database Converter plugin (link on the right) is a WordPress plugin that converts an existing Latin1 based WordPress database to a full UTF-8 system. You should install and run this plugin on your WordPress database before migrating to Drupal and this should fix the odd character issue.
wp2drupal silently failing
The next problem I hit was that wp2drupal appeared to have worked but had only moved across about 400 of my 770+ WordPress posts into Drupal. I re-ran it and it would fail on a different post each time which seemed to indicate a memory issue rather than a problem with a specific post.
I looked at the PHP logs and sure enough, PHP was reporting that it was unable to allocate any more memory.
PHP info was showing that the standard memory limit was 16M, so I edited my php.ini file to set memory_limit to 64M.
Another re-run of wp2drupal and this time all of my WordPress posts came into Drupal perfectly.
Keep your post IDs consistent between WordPress and Drupal
Currently wp2drupal creates a new Drupal 6 node for every WordPress post. This node will have an ID which is different from the original WordPress post ID.
Keeping your WordPress post IDs consistent with your node IDs in Drupal and keeping most of your URLs the same during a site migration makes handling ‘old’ URLs and redirecting much, much simpler. Of course, this only works if you are migrating into a new Drupal 6 site and not if you are migrating into a Drupal site with existing data.
So, if you want to keep your WordPress Post IDs in Drupal 6, here’s what to do:
1. In the wp2drupal module, open the file ‘migrate_2.php’ for editing.
2. At approximately line 419, you’ll see the line node_save($node);
3. Immediately after this line, add the following code:
db_query('UPDATE {node} SET nid = %d WHERE nid = %d and vid=%d', $post['ID'], $node->nid, $node->vid);
db_query('UPDATE {node_revisions} SET nid = %d WHERE nid = %d and vid=%d', $post['ID'], $node->nid,$node->vid);
$node->nid = $post['ID'];
This should import all your posts into Drupal 6 and keep the node IDs consistent with the original post IDs. This can only be done on a clean Drupal install and of course, I can’t guarantee it will work for you.
Keep your URLs consistent between WordPress and Drupal
To keep your urls the same as your original wordpress site, just install the pathauto module and configure an automated alias to match your original WordPress site [Administer->Site Building->URL aliases then Automated Alias Settings then Node Path Settings]. On the site I’m building, my automated alias setting is: ‘news/[nid]/[title-raw]’.
If you change the automated alias settings after importing from WordPress, you will need to check the ‘Bulk generate aliases for nodes that are not aliased’ before you save the pathauto configuration.
There appears to be a bug (feature?) in Pathauto when the bulk generate option is checked. It will only generate about 50 new aliases each time the configuration is saved. I haven’t tested this, but the culprit appears to be in the file ‘pathauto_node.inc’ at about line 101:
$result = db_query_range($query, $pattern_types, 0, variable_get("pathauto_max_bulk_update", 50));
Notice the ’50’ in this line? I think increasing this value should fix the problem. If you prefer not to do this, just re-save the automated alias settings page with the bulk generate option checked and a further 50 aliases will be generated.
Stuck?
Finally, if you are trying this with your own site and are having problems, post a comment below and Webopius will do all we can to help. If you would like us to perform the migration for you, we would be happy to review your project.
Tags 3d Add new tag apple asp aspdotnetstorefront backups bugzilla cgi chrome cloud computing cms content management CRM css Document Management drupal ecommerce expression engine flash google hiring host hosting img_assist interprise JQuery KnowledgeTree mac mamp modules mouse nginx opencart php PRINCE2 printer project management snow leopard ssl tinymce Tips translation unix webgains wordpress
Thank you for useful tips with database migrating, I think you can also add some information about theme.
This is a great page, thanks.
I’d been messing around with wp2drupal for a while, and your instructions on getting post ID to match node ID is exactly what I needed. There’s one more step I need to make for post redirects, however, and I’m not sure how to do that.
In my .htaccess file, I know how to take a query to the old wordpress database (?p=900, for example) and redirect it to /node/900:
RewriteCond %{QUERY_STRING} ^p=900$
RewriteRule ^$ /new/node/900? [R=301,L]
Is there a way to tell .htaccess to redirect all queries ?p=n to /node/n?
Thanks Vivek,
You’ll see I’ve made a slight change to the code in the post that makes sure the Drupal nodes match the WP posts IDs. I was missing the vid previously.
To answer your question, if you already have WordPress URLs with the post ID in, such as ‘/somepath/p=n’ and you want to redirect all of these to the equivalent node id based URL in Drupal, something like this should work in your .htaccess file:
RewriteRule ^(.*)p=(.*)$ node/$2
(I haven’t tested this so you may need to alter it slightly)
Thanks for your response! From what I understand, rewriterule doesn’t accommodate mysql requests on its own, hence the need for the rewritecond. I might be wrong about that, but this is how I got it to work:
Put the following in .htaccess:
RewriteCond %{QUERY_STRING} ^p=(.*)$
RewriteRule ^$ /node/%1? [R=301,L]
Obviously for this to work the WordPress post ID has to match the Drupal node ID.
These lines also assume that Drupal has replaced WordPress in the same directory.
Again, hats off to you for the wp2drupal hack. It’s a great way to redirect URLs for those of use who stuck with the default WordPress permalinks!
FYI, I’ve set up a project for the D6 version of WP2Drupal:
http://drupal.org/project/wp2drupal
I’ve stabilized it and will be adding features to solidify the data migration.
It’s the first time I commented here and I must say you share us genuine, and quality information for bloggers! Good job.
p.s. You have a very good template for your blog. Where did you find it?
Thank you for helpful article.
FYI – according to WP users UTF converter truncates posts.
I am going to try WP2Drupal in a few minutes, and hope it works.
But I am going to start by removing all revisions in WP DB, which you did not mention.
Best wishes, Mitchell
My current blog is at:
mysite.com/blog/
A typical blog post URL is (‘ugly’ WordPress URLs):
mysite.com/blog/?p=1234
I’m migrating to Drupal and, on a test site, have followed these steps but when I visit:
mysite.com/blog/?p=1234
It displays the landing page for all blog posts and the URL transforms to (the final slash is removed):
mysite.com/blog?p=1234
Any thoughts?
A million thanks for posting this. At least a million.
I have all my data in drupal now. I don’t know about nodes and paths yet (just starting out). Is it possible to do the pathauto module at some time in the future, or will that present a problem?
What an amazing review! Thank you so much for all the tips! As of March 16th, the only change to the above was to update wp2drupal once I installed the one from the link you provided!
Thanks!
Hi, the value of ’50’ that you referred to is simple a default. You can change that in the admin settings for bulk alias updates.