Hadoop is not efficient for very very small jobs, as it takes more time for the JVM Startup, process initialization and others. Though, it can be optimized to some extent by enabling JVM reuse.
Hadoop is not efficient for very very small jobs, as it takes more time for the JVM Startup, process initialization and others. Though, it can be optimized to some extent by enabling JVM reuse. hadoop.apache.org/common/docs/r0.20.2/ma... Also, there is some work going on this in Apache Hadoop https://issues.apache.Org/jira/browse/MAPREDUCE-1220 Not sure in which release this will be included or what the state of the JIRA is.
Got it ,thanks . ................. – ruby-boy Aug 5 at 14:24.
This is not unusual. Hadoop comes into effect with large datasets. What you are seeing is probably the initial startup time of Hadoop.
– ruby-boy Aug 5 at 7:55 I just can't believe it ~ – ruby-boy Aug 5 at 7:55 Think about it; it might take 30s to set up the platform but this is neither here not there when you are processing gigabytes or terabytes of data. It's not designed for small amounts of data. – Adrian Mouat Aug 5 at 8:00 OK,this time I run it with 15 files , about to 28.9K, and it cost 1mins, 11sec!
And this time , set up? – ruby-boy Aug 5 at 8:10 1 Real testings starts with size over multiple giga bytes. I have started tests with files which where at least 20 GiB in size.
Cause under this size it's not a real challenge for Hadoop. Are you working with pseudo-distributed mode or have you setup a small cluster with at least three data-nodes and one name node? – khmarbaise Aug 5 at 8:23.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.