Big Data Management

DataShepra enables to manage data generated from various sources, high volume, different formats by collecting, storing, and processing data efficiently and effectively.

  • Real time data processing and analytics
  • Batch processing
  • Managing data in different formats - structured, semi-structured, un-structured data.
  • Scalable and high performance data management systems
  • Apache Spark: Fast and large scale data processing engine for real-time analytics, batch processing, Interactive analytics, and graph processing. Runs on resource managers like Mesos, YARN.
  • Hadoop: Distributed storage and processing platform for large data sets. Data crunching is done over YARN cluster management technology and batch processing framework MapReduce.
  • Apache Flink: Distributed Stream and batch processing engine. Provides data distribution, communication, and fault-tolerance for distributed computations over data streams.
  • Storm: Distributed real time computation system which reliably processes unbounded streams of data.
  • Data Ingestion and retrieval: Continuous ingestion and querying of data with Hadoop ecosystem technologies like Flume, Sqoop, Kafka, Hive, Drill, Elasticsearch
  • NoSQL Databases: Storage and retrieval of unstructured and semi-structured data with NoSQL databases like MongoDB, Cassandra, HBase.