Data aggregation is one of the roads we use to understand our data diversity. SQL "Selective Query Language" is the easiest way we use to do so. below is how to map our SQL syntax to Pig or Spark.   SQL Structure:     What to retrieve, stating which column we choose to display from my data structure   SQL: Select student, age From mathClass     Pig: namedMathClass = foreach mathClass generate (chararray) $0 as student:chararray, (int) $2 as age:int ;  Spark: namedMathClass = mathClass.map( row => row(0), row(2) )   Whether this row is to be added in our data-set or not "Condition"   SQL: where age > 10  Pig: greater_10 = Filter namedMathClass by age > 10 ;  Spark: greater_10 = namedMathClass.filter( col => col(1) > 10 )   How to aggregate, we group similar data together  in one bag then apply our aggregate function on this bags    SQL: Select age, Count(student) From mathClass group by age  Pig:   groupAge = Group mathClass by age;  Iterate_...
is nature,the part that we influence, starts when we decide.