Nuxeo/Blogs

Nuxeo Developers Blog/News from the Open Source ECM trenches

Merging RSS and Atom feeds from various sources

without comments

I have a lot of Python rss/atom feeds in my aggregator and entries are
doubled all over the place.

Could’nt find any tool that would merge entries from several sources out
there, in a smart way, by trying to find doublons.

I wrote a little script, extending Mark Pilgrim’s feedparser we use in
CPSRSS
, to merge several sources, using the difflib module and the rss
rendering we have in
CPSBlog
.

It calculates the diff ratio on the title and content of each entry to
decide wheter
it’s the same entry. When the ratio is <= 0.2 it’s the same entry
(hopefully :) )

Here’s an example ran on these:

The result is here
(It’s a one-shot xmlfile, made today, so it’s not a real feed
 it is still readable by any client though)

Now I’ve been told that this was pretty useless, and that i would better
make some clean in my feeds and do more interesting stuff in my spare
time.

But i can’t help it: everytime i see a feed related to python I just add
the stuff
 to my client :’). So for an unorganized person like me, a CPRSS
personnal website with this merging capability, where i can drop tons of
feeds would be perfect.

(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)

October 16th, 2005 at 10:18 am

Posted in Uncategorized