Skip to main content

Posts

Showing posts from March, 2016

Big data OverView

Could you define Beauty ? So is Big Data, it is itself a definition. you could ask what is its characteristics. Big data has n Vs dimension, where n often changes. Laney (2001) suggested that Volume, Variety, and Velocity as 3 Vs, then IBM added Veracity "realism" as the fourth V, later Oracle introduced Value. So how would we process this Big Data. I use hadoop & wish to learn spark. Hadoop is an opensource framework used for analyzing big chunk of data, its divide to 2 modules. map-reduce module and a file system module "HDFS". hadoop divide data to small chunk, start processing each chunk on its own, then start combining each chunk again "divide and conquer principle we used to do in merge sort", each chunk need a core & memory to run on. as a start I need to define location of my data, where would my data reside data would reside hadoop file system (HDFS) fs.defaultFS : hdfs://rserver:9000/ then I define my resources " number o