HighDots Forums  

Removing duplicate entries/stories from a RSS feed?

alt.html alt.html


Discuss Removing duplicate entries/stories from a RSS feed? in the alt.html forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
gaikokujinkyofusho@gmail.com
 
Posts: n/a

Default Removing duplicate entries/stories from a RSS feed? - 12-06-2006 , 10:21 AM






Hi, I have been enjoying being able to subscribe to RSS
(http://kinja.com/user/thedigestibleaggie) for awhile and have come up
with a fairly nice list of feeds but I have run into an annoying
(though not critical) problem, duplicate stories. Apparently there is
overlap with some of the sites I subscribe to so I get duplicate
stories. Does anyone know of some sort of filter (software or online
service) that can remove duplicate stories? Any help or suggestions
would really be appreciated!

Cheers

-Gaiko


Reply With Quote
  #2  
Old   
Paul Lutus
 
Posts: n/a

Default Re: Removing duplicate entries/stories from a RSS feed? - 12-06-2006 , 01:25 PM






gaikokujinkyofusho (AT) gmail (DOT) com wrote:

Quote:
Hi, I have been enjoying being able to subscribe to RSS
(http://kinja.com/user/thedigestibleaggie) for awhile and have come up
with a fairly nice list of feeds but I have run into an annoying
(though not critical) problem, duplicate stories. Apparently there is
overlap with some of the sites I subscribe to so I get duplicate
stories. Does anyone know of some sort of filter (software or online
service) that can remove duplicate stories? Any help or suggestions
would really be appreciated!
Write a script in a language that supports associative arrays (as do Java,
Perl, Ruby, Python, and even JavaScript). Key the associative array to a
unique key created out of elements in the various RSS feed items. Fill the
associative array using the generated key.

Unfortunately, it is rare for two RSS feed items to be truly identical.
Often, they tell the same story with small differences in wording (to avoid
accusations of plagiarism) and of course the URL is normally different.

Without some complex coding to detect items that are almost the same, the
above method will remove only genuinely identical items from different RSS
feeds.

--
Paul Lutus
http://www.arachnoid.com


Reply With Quote
  #3  
Old   
Andy Dingley
 
Posts: n/a

Default Re: Removing duplicate entries/stories from a RSS feed? - 12-06-2006 , 02:46 PM




Paul Lutus wrote:

Quote:
Unfortunately, it is rare for two RSS feed items to be truly identical.
They don't need to be - they have a link URL embedded in them and
that's perfectly adequate for disambiguation.

You might even have a feed with deliberate isPermaLink properties on
each item, for exactly that purpose.



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.