Hadoop custom split of TextFile?

You should create your own variation on TextInputFormat. In there you make a new RecordReader that skips lines until it sees the start of a logical line.

This is more elegant than what I currently did. I built a local iterable that gave me a logical line and used a RecordReader to transmit the entire document as a ByteWritable. Thanks for the tip!

– dvk Jun 17 at 4:51.

Preprocess the input file to remove the newlines. What is your goal in creating the SequenceFile?

I don't find the question asked earlier, but you just have to iterate over your lines via a simple mapreduce job and save them into a StringBuilder. Flush the StringBuilder to the context if you want to begin with a new record. The trick is to setup the StringBuilder in your mappers class as a field and not as a local variable.

Here it is: Processing paraphragraphs in text files as single records with Hadoop.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions