January 22, 2009

Creating a Google News sitemap feed in Drupal 6

Filed under: Tips — Tags: , — Webopius @ 10:22 pm

A news site such as Google News gets millions of visitors each day. If you manage to get your site approved for Google News it can give you a great traffic boost.

Once you’re in, you need to tell Google about new articles as quickly as possible – you do this with a Google News sitemap feed that you can submit using Google Webmaster tools.

Pre-written Drupal Module
A Google new feed module for Drupal 6.
Note: This module is provided ‘as is’ without warranty or support.

We built a news site just over a year ago and were fortunate to get approved by Google News. Using a Google news sitemap we usually have new articles appearing in Google News within about half an hour of them being posted on the site.

Google’s rules for building a news sitemap…

Before we get into building the news feed, let’s look at Google’s rules for the sitemap:

  • News sitemaps should use a custom namespace format
  • A news sitemap should only contain URLs for articles published in the last three days.
  • Each URL submitted needs to include the article publication date in W3C format (yyyy-mm-ddThh:mm:ssZ)

Here’s an example of a typical Google news compliant sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
  <loc>http://www.cotswoldnews.com/news/957/cotswold-council-tax-set-to-rise-by-29</loc>
  <news:news>
    <news:publication_date>2009-01-20T18:06:57+00:00</news:publication_date>
  </news:news>
</url>
</urlset>

Building a Google News sitemap in Drupal 6

Drupal’s excellent module system makes building a Google news sitemap pretty straightforward, all you need to do is create your own module. In this example, let’s call the module “newsmodule”.

First, let’s create the module’s .info file (newsmodule.info) and save it in sites/all/modules/newsmodule:


name = newsmodule
description = Example Google news sitemap generator
core = 6.x
version = "6.x-1.0"

Next, create the module file itself (newsmodule.module) and save it in the same directory as the .info file.

Within the module file, create a function to retrieve the published nodes for the past 3 days. Note: In our case, we have a content type of ‘news’ set which is specific to our site. You need to change this to suit your own site’s content configuration.

Having found the right nodes, the function simply generates the Google sitemap news format and outputs the result as an xml feed:

function _newsmodule_getgooglenews() {
  drupal_set_header('Content-Type: text/xml');
  $content='<?xml version="1.0" encoding="UTF-8"?>';
  $content.='<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"';
  $content.=' xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">';
  $sql = "select nid, created from {node} where status=1 and type='news' ";
  $sql .= "and from_unixtime(created) >= date_sub(curdate(),interval 3 day) order by created desc";
  $res = db_query($sql);
  while ($data = db_fetch_object($res)) {
    $nid = $data->nid;
    $node_date = date(DATE_W3C,$data->created);
    $node_url = url("node/$nid");
    $content .= '<url>';
    $content .= '<loc>http://www.YOURSITEDOMAIN.com'.$node_url.'</loc>';
    $content .= '<news:news>';
    $content .= '<news:publication_date>'.$node_date.'</news:publication_date>';
    $content .= '</news:news>';
    $content .= '</url>';
  }
  $content .= '</urlset>';
  print $content;
}

Finally, create a hook_menu function within the same module file to tell Drupal that you have a custom URL which then calls the generator function:

function newsmodule_menu() {
   $items['googlenews.xml'] = array(
   'title' => 'Google News feed',
   'page callback' => '_newsmodule_getgooglenews',
   'access arguments' => array('access content'),
   'type' => MENU_CALLBACK,
   );
   return $items;
}

Now, you should be able to activate the module in Drupal and browse to www.YOURSITENAME.com/googlenews.xml and the sitemap will be generated. If all went well, you can now submit the Google news sitemap url to Google’s webmaster tools.

A Google News sitemap module

For those of you who prefer it, I’ve written a slightly more enhanced module than described above (it has a configuration screen that allows the content types for the feed to be specified. You can find the zip file of the Google new feed module here.. Note: This module is provided ‘as is’ without warranty or support.

  • Popular Tags

  • Add to Technorati Favorites

    13 Comments »

    1. i m new to drupal….i have tried all thing as per given, but after enabling module, when i m making path to browser , it doesn’t work

      Comment by amit — January 23, 2009 @ 5:47 pm

    2. Hi Amit,

      If you have any more information about what you are seeing I can certainly try and help.

      Also, for new Drupal users, I’d recommend reading this article:
      http://drupal.org/node/206757

      Comment by Webopius — January 23, 2009 @ 7:33 pm

    3. Following on from feedback and to make installation simpler, I’ve written a complete fully functioning Drupal 6 GoogleNews sitemap module.

      You should see a link above and to the right of the article.

      We have done some testing of this module and are happy to respond to feedback but this module is provided ‘as is’ and we can’t provide any warranty as to whether it will work for you.

      Feel free to use and modify it as you want. A link back to the webopius.com site would be welcome.

      Comment by Webopius — January 24, 2009 @ 8:56 pm

    4. Have you considered setting this up as a project on Drupal.org? I’m sure that many people would love to use this but unless they find this page by Googling for it (how I found it) they may just assume that nobody has done it. You may also find that others will contribute and add new functionality, especially if Google extend or change the interface.

      Comment by Adrian — February 10, 2009 @ 2:45 am

    5. Hi Adrian,

      Yes, we definitely plan to do this soon.
      Thanks for the feedback.

      Adam

      Comment by Webopius — February 10, 2009 @ 7:36 am

    6. Does the site map regenerate on cron, do we need to manually update it by visiting http://www.YOURSITENAME.com/googlenews.xml, or, as I assume, it generates a feed each time that URL is called? I jsut want to insure we are doing this correct.

      Comment by Chad — February 11, 2009 @ 7:00 pm

    7. Hi Chad,
      You are correct – it generates the news feed each time that URL is visited. A performance improvement might be to put a cache wrapper around the query so that it only re-generated the feed at a time to suit the frequency your site is updated.

      Having said that, we’re using it on a live news site without cache code and there aren’t currently any issues. We ping Google when we post new articles, Google visits the site and parses the feed and we usually get new content in Google news appearing within about 30mins.

      Comment by Webopius — February 11, 2009 @ 8:59 pm

    8. I love the module, works great. We also run a live site with anywhere from 5 to 20 articles going up every 30 minutes. Do you have any intention of adding categories for Google news? We use a taxonomy field to define the “kind” (category) of news, and I would like to utilize it to better report to Google. Finally, in this reply you mention you ping Google when you post new articles, how and why do you do that?

      Comment by Chad — February 26, 2009 @ 8:12 pm

    9. Hi Chad,

      I like the idea of adding taxonomy to the selection options. When I get a chance, I’ll see if I can update the code to handle this.

      In the meantime, you can try it by enhancing the ‘googlenews_admin_settings’ function (downloadable version of the module in the link at the top of this page).

      Repeat the code used to query and select from node_type, but query the term_data table instead.

      Now, in the SQL that generates the news feed, you can use this selection to restrict the feed to specific node types and categories.

      For pinging Google, we currently use Feedburner, it automatically pings a load of blog/news services for us and most visit within a few minutes. I guess the fact Google owns Feedburner may be another reason Google News works well with it!

      Comment by Webopius — February 26, 2009 @ 9:29 pm

    10. Hi. I have the module from CVS. Any pointers on back-porting this to 5.x would be appreciated

      Comment by David Hart — March 2, 2009 @ 10:02 pm

    11. Hi David,

      I’ve not used Drupal 5 but the Googlenews module is only a few functions and from looking at the Drupal 5 API, it would only require a few changes to the hook_ functions.

      Comment by Webopius — March 3, 2009 @ 8:14 pm

    12. After struggling to beat this into submission for D5, I ended up up using some sql and a little sed for formatting in a bash script. xml sitemap can do this with a view but it is extremely resource heavy.

      One of these days, I’ll UG to D6.

      Comment by David Hart — March 7, 2009 @ 12:32 am

    13. Hi David, that sounds like a lot of trouble! I was on a course last week and had very little time spare but now I’ve had a chance to look at this for you and a Drupal 5 version of the module is here:

      http://drupal.org/project/googlenews

      Hope this helps you and best of luck with your site.

      Comment by Webopius — March 7, 2009 @ 10:49 am

    RSS feed for comments on this post. TrackBack URL

    Leave a comment