Big Data

Big Data is not just about velocity, volume, veracity and variety. It is about how you identify the right information from data that is growing exponentially, and use it to add business value.

HandLoop

Apache Hadoop is an open source project that offers a new way to store and process big data. Hadoop is a framework for storing, analysing and accessing large amount of data, quickly and cost effectively through clusters of commodity hardware. Web 2.0 companies such as Google and Facebook use Hadoop to store and manage their huge data sets.
Hadoop is capable of computing on single server to thousands of machines and provides a low cost, but then dependable solution to tackle data management problems.
Hadoop ecosystem includes: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Spark.

Application

Load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Reports for the BI
Map Reduce jobs for data cleaning and pre-processing.
Data visualization
Big Data Management
Big Data Analytics
Data architecture including data ingestion pipeline design
Data modeling and data mining
Machine learning and advanced data processing
Optimizing ETL workflows
Real time queries over Big Datag
Data Serialization
Data Analytics

Area of Expertise

Real-time analytic
Processing: MapReduce
Query-Engine : Hive, Impala
ETL: Pig
Resource Manager: YARN, Mesos
Big Data Analytics
Distribution: CLoudera, HortonWorks, Apache
Data Integration: Flume, Sqoop
NO-SQL: Hbase, MongoDB
Security: Ranger, Sentry, Kerberos

Expertise

cloud Computing Testing Digital Marketing Big Data