Big Data

Let's
Connect
DISMISS

Overview

Our expertise in advanced analytics is backed by strong capabilities in data engineering for both traditional and big data needs. Our engineers are experts in state-of-the art technologies, and know how to customize the solution per client needs and constraints. Big Data at Vitrana is no more a term that is used for batch processing of Large data sets rather it is querying super large data sets with low latency queries and In-memory analytics.

Our Quality

We are pioneers in lambda architecture and have built systems par excellence in dealing with data that is flowing and clogging. We know how to work with data at rest and data in motion and use frameworks that can store and manage huge data-sets. We have solution that can query more than a 100 million records, carved out of terabytes of data warehouse in the real time with a Turn Around Time of 5-6 seconds using In-memory Data Processing.

Tools We Use

Some tools that we work with are:

  • Apache Spark
  • Apache Cassandra
  • Apache Kafka
  • Sql Server 2016
  • Redshift
  • Elastic Search
  • Sql Warehouse
  • Dynamo DB / Document DB
  • MongoDB / Couch DB

Implementation

One of the largest pharma companies situated on the west coast wanted to revamp their existing data warehouse. This was a big challenge as they were already using one of the most cutting edge solutions which was currently delivering all the requirements however it was not capable of delivering all the future needs as the new product launch was suppose to explode their warehouse by 4 times in terms of the data volume. The current solution required a proprietary license and also it was not an ideal scenario for that software to process that large volume of data eventually.

We implemented a lambda solution that uses state of the art distributed in-memory computing system – Spark. This would meet their entire requirements both current and future and the system performance also grew by 4 times. This was also a huge gain as Apache Spark is an Open Source and runs on commodity hardware bringing their cost to mere one tenth of the earlier solution.

Eventually:

  • The organization saved by more than 86% using the new platform.
  • The processing improved by 4 times for example a report latency became one minute from 4 minutes.
  • The environment can sustain up to 20x times their current need so all the future endeavors were taken care
DISMISS