North East Systems - Internet of Things > Artificial Intelligence > DevOps

Big Data

Big Data is not just about velocity, volume, veracity and variety. It is about how you identify the right information from data that is growing exponentially, and use it to add business value.

Hadoop

Apache Hadoop is an open source project that offers a new way to store and process big data. Hadoop is a framework for storing, analysing and accessing large amount of data, quickly and cost effectively through clusters of commodity hardware. Web 2.0 companies such as Google and Facebook use Hadoop to store and manage their huge data sets.
Hadoop is capable of computing on single server to thousands of machines and provides a low cost, but then dependable solution to tackle data management problems.
Hadoop ecosystem includes: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Spark.

Application

Load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Reports for the BI
Map Reduce jobs for data cleaning and pre-processing.
Data visualization
Big Data Management
Big Data Analytics
Data architecture including data ingestion pipeline design
Data modeling and data mining
Machine learning and advanced data processing,
Optimizing ETL workflows
Real time queries over Big Data
Data Stream Processing
Data Serialization
Data Analytics

Areas Of Expertise

Real-time analytic: Spark
Processing: MapReduce
Query-Engine : Hive, Impala
ETL: Pig
Resource Manager: YARN, Mesos
Big Data Analytics
Distribution: CLoudera, HortonWorks, Apache
Data Integration: Flume, Sqoop
NO-SQL: Hbase, MongoDB
Security: Ranger, Sentry, Kerberos