Hadoop custom split of TextFile?

You should create your own variation on TextInputFormat. In there you make a new RecordReader that skips lines until it sees the start of a logical line.

This is more elegant than what I currently did. I built a local iterable that gave me a logical line and used a RecordReader to transmit the entire document as a ByteWritable. Thanks for the tip!

– dvk Jun 17 at 4:51.

Preprocess the input file to remove the newlines. What is your goal in creating the SequenceFile?

I don't find the question asked earlier, but you just have to iterate over your lines via a simple mapreduce job and save them into a StringBuilder. Flush the StringBuilder to the context if you want to begin with a new record. The trick is to setup the StringBuilder in your mappers class as a field and not as a local variable.

Here it is: Processing paraphragraphs in text files as single records with Hadoop.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Hadoop custom split of TextFile?

Related Questions

Hadoop pipes (wordcount) example failing (with hadoop 0.21.0)?

Libcurl output to variable instead of textfile?

Delphi textfile empty?

Ifstream getting information from a textfile?

How to edit fields which are stored in textfile using classic asp?

Counting the number of occurrences of words in a textfile?