BBC RSS reparser

What?

Basically I wanted full article text and images from the and so I’ve built what is at the moment a fairly simple reparser to scrape the rest of the content and include it in the feed.

I’ve built it as part of another project but also so that when I’m getting the train into work I can read the full headlines from the BBC RSS feeds and not just the first line or so without forking out for a mobile data plan. I use an so essentially in the morning the  feed client I use updates off my home wifi before I leave. I am planning on testing on other devices but havn’t had a opportunity thus far.

How?

Its not that complicated a script but essentially it reads the requested RSS feed, scrapes the target links for each item in the channel and pumps it back out with that full text. You can also choose not to include images if you have a device with limited storage, the XML generated on its own is around 110Kb feed dependent and of course the images will increase the total download size quite considerably if you wish to do as I do and cache it all to your mobile device.

At the moment it caches the original RSS feeds for an hour but doesn’t cache the scraped content, this is something I still need to work on, an optional item limit might be useful as well, easy to implement but still needs a spare moment or so which I need to find!

Using it…

The below link will allow you to build your own feed based on a BBC News RSS feed, I’ve tested the available feeds and believe thus far they are producing a satisfactory valid output using the.

I’ve tested so far in Egress, Bloglines and readers. It is still a little messy in its implementation, please remember its still a work in progress!

Update! 29th Nov
  • The script is being hit a great deal more than I expected indicating that a) I need to optimise it a little more for efficiency/speed and b) there is a demand for full feeds (no surprises there!)
  • I’ll be updating the script over the next 24 hours to include caching of the article texts, this will a) increase speed, a lot! b) enable me to do a more comprehensive filter of the tags and article contents to remove forms and clean up the rather dirty markup which results within the RSS>Item>Description part of the feed. I’ll have to think about this a bit more in terms of how long to cache this for etc but it should be done by thursday early am.

Tags: , , , , 

Tags: , ,

One Response to “BBC RSS reparser”

  1. [...] Hate the way the BBC gives you only half the news when you subscribe to their news feeds? I do. So does Duncan Barnes. And he has made a script to mend it. The script will provide full feeds from the front page or any of the subsections. [...]

Leave a Reply