info@northeastss.com
Dot Pattern

Big Data

Big Data

Big Data is not just about velocity, volume, veracity and variety. It is about how you identify the right information from data that is growing exponentially, and use it to add business value.

Hadoop

  • Apache Hadoop is an open source project that offers a new way to store and process big data. Hadoop is a framework for storing, analysing and accessing large amount of data, quickly and cost effectively through clusters of commodity hardware. Web 2.0 companies such as Google and Facebook use Hadoop to store and manage their huge data sets.
  • Hadoop is capable of computing on single server to thousands of machines and provides a low cost, but then dependable solution to tackle data management problems.
  • Hadoop ecosystem includes: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Spark.

Application

  • Load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
  • Reports for the BI
  • Map Reduce jobs for data cleaning and pre-processing.
  • Data visualization
  • Big Data Management
  • Big Data Analytics
  • Data architecture including data ingestion pipeline design
  • Data modeling and data mining
  • Machine learning and advanced data processing,
  • Optimizing ETL workflows
  • Real time queries over Big Data
  • Data Stream Processing
  • Data Serialization
  • Data Analytics

Areas Of Expertise

  • Real-time analytic: Spark
  • Processing: MapReduce
  • Query-Engine : Hive, Impala
  • ETL: Pig
  • Resource Manager: YARN, Mesos
  • Big Data Analytics
  • Distribution: CLoudera, HortonWorks, Apache
  • Data Integration: Flume, Sqoop
  • NO-SQL: Hbase, MongoDB
  • Security: Ranger, Sentry, Kerberos