DataShepra enables to manage data generated from various sources, high volume, different formats by collecting, storing, and processing data efficiently and effectively.
- Real time data processing and analytics
- Batch processing
- Managing data in different formats - structured, semi-structured, un-structured data.
- Scalable and high performance data management systems
- Apache Spark: Fast and large scale data processing engine for real-time analytics, batch processing, Interactive analytics, and graph processing. Runs on resource managers like Mesos, YARN.
- Hadoop: Distributed storage and processing platform for large data sets. Data crunching is done over YARN cluster management technology and batch processing framework MapReduce.
- Apache Flink: Distributed Stream and batch processing engine. Provides data distribution, communication, and fault-tolerance for distributed computations over data streams.
- Storm: Distributed real time computation system which reliably processes unbounded streams of data.
- Data Ingestion and retrieval: Continuous ingestion and querying of data with Hadoop ecosystem technologies like Flume, Sqoop, Kafka, Hive, Drill, Elasticsearch
- NoSQL Databases: Storage and retrieval of unstructured and semi-structured data with NoSQL databases like MongoDB, Cassandra, HBase.