Skip to main content

Posts

Showing posts from 2016

digging

Open SVG image in a browser, use arrows to navigate When you say digging, 1 st thought, most would think that you would plant a tree. How about digging in DATA 1 st Hadoop is a framework for processing large chunks of data, consisting of 2 modules HDFS: Hadoop Distributed File System "for managing files". Map-Reduce: hadoop methodology for processing data, where big chunks of data is divided into smaller chunks, each directed to the map f n to extract the needed data from, then the reduce f n where the actual processing we need takes place. Hadoop work on the whole data, in one time,  so it is considered Batch processing. 2 nd Hadoop eco-system It would be annoying, that each time you wish to do a task, you write a java code for each of the map function, then the reduce function, compile the code.. etc. yet Hadoop eco-system provide us with tools that could do so for us PIG: a scripting language "that is translated in the background to a

Big data OverView

Could you define Beauty ? So is Big Data, it is itself a definition. you could ask what is its characteristics. Big data has n Vs dimension, where n often changes. Laney (2001) suggested that Volume, Variety, and Velocity as 3 Vs, then IBM added Veracity "realism" as the fourth V, later Oracle introduced Value. So how would we process this Big Data. I use hadoop & wish to learn spark. Hadoop is an opensource framework used for analyzing big chunk of data, its divide to 2 modules. map-reduce module and a file system module "HDFS". hadoop divide data to small chunk, start processing each chunk on its own, then start combining each chunk again "divide and conquer principle we used to do in merge sort", each chunk need a core & memory to run on. as a start I need to define location of my data, where would my data reside data would reside hadoop file system (HDFS) fs.defaultFS : hdfs://rserver:9000/ then I define my resources " number o