Natural Language/Text Mining and Reddit/social news site?

I have found in the past that the best way to mine data on sites like Reddit or Digg is to first use the developer API that they provide. Typically you have a focused interest in either a topic or trend, and the only way to get that data is through an established public interface. You can also parse feeds, and combine them both to uncover 90% of what you would want to know.

If you want to do deep research on data not available through an API, then you should be prepared to spend a significant amount of time writing custom wrappers around a tool like cURL. If you have the budget you can also call them and ask if they offer paid research data on users.

I'd start on the RSS, and after that I might use Nutch; what to actually do with the data is more your call.

These are good ideas. I can get the data, but what applications can be built around it?

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions