Find out what is the new data from an RSS feed, compared with entries already in the database. Then put new entries in db. How?

Then you just generate the hashcode for each item in the new RSS, and check it against the database.

Up vote 1 down vote favorite share g+ share fb share tw.

Let's say I have a database, and an RSS feed. I have to find out what is the new data from an RSS feed, that isn't already in the database. How would you go about approaching this problem?

Xml rss link|improve this question asked Oct 4 '08 at 20:31lol_wut.

You just beat me to it :) – David Robbins Oct 4 '08 at 20:40.

First you have to uniquely identify each item. This is problematic because some sites use the guid element and some sites don't, and for some items the link element never changes and for some it does. I think that the general rule of thumb is that if an item has a guid you use that as the key, otherwise you use the link as the key and hope.

Once you've established the key for an item, you can (probably) determine whether the item you're looking at has been updated by examining the pubDate element, which ought to be updated if the story gets updated. This approach will handle most cases, though as with everything related to RSS it breaks down if the feed provider isn't behaving properly.

Most RSS feeds will have a date with each story - so, make a query to pull the latest story's date from the database, pull all of the latest stories from the RSS feed, and compare dates. It also depends on whether this is for one particular feed or if you are writing something that will work for many feeds. If it's supposed to work for all feeds, use one of the hashing methods; create a hash of the title and date and use this as a unique identifier.

Pull from a unique field of a particular item in the rss feed. Then check to see if that item is already in the db. Run this logic in a loop.

Off hand, a few suggestions: Perform a check sum on each item in the feed, store the result in the database. Compare the results in database with each new file / stream from the RSS source. Hash the title.

Date and time for each item and store in the database. Compare with each refreshed RSS stream.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions