Case Study: Advanced Analytics on top of a legacy on-prem DWH

Introduction

In this post we’re going to describe an advanced analytics project we recently delivered for a large international company. The main goal of the project was to improve the management of our client’s retail data (sales data coming from their shops) by adding an additional advanced analytics platform on top of the existing infrastructure.

Business Challenge

Prior to this project all retail data was only managed by a large, traditional, monolith on-prem SQL data warehouse. Data coming from this legacy data warehouse is aggregated and presented in pre-made group-level dashboards which are limited and highly dependent on report developers. Accessing more detailed data or data outside of these dashboards, or creating custom reports based on it would require a lot of effort.

This rigid infrastructure was not suitable for advanced analytics which requires dynamic data and an option for self-service business intelligence (BI). Furthermore, our client has other BI solutions that would also require this type of dynamic data.

Solution – Advanced Analytics Platform with Azure Synapse and Databricks

We built a new, cloud-based (Azure Synapse-based) data warehouse on top of the existing on-premise data warehouse and connected it to the existing Databricks (DBX) Platform. We extract the raw data from the on-prem data warehouse on a daily basis and make it accessible in our new Advanced Analytics Platform. 

The new platform consists of two separate sub-platforms:

  1. A Databricks (DBX) Platform for development and Business Value Realization solutions (for self-service development)
  2. A Synapse Platform for transactional level analysis (for self-service analysis)

The Databricks Platform is a self-service development platform that uses Python, Machine Learning and Artificial Intelligence technology. It’s mainly meant for internal developers for forecasting and creating various data science solutions, and for external vendors (developers) who deliver custom solutions to cover their own business needs.

While the Synapse platform’s main goal is to enable self-service data analysis. Users can directly connect to the raw data from the data lake, execute custom queries, see the data on a transactional level and even connect to the data from Excel or Power BI.

More dynamic data

Compared to the aggregated dashboards (built on the on-prem data warehouse), in the new Synapse-based data warehouse all the data is available on transactional level. This means data like: cart size (how many items are bought together), all transactions, transactional data on item level (which items are bought together, discounts, sales quantity, etc) all meta-and master data (shop details, like opening hours, plot size, geo coordinates etc.) Most of this data was previously NOT available in the pre-made dashboards built on the legacy on-prem system, and now it is available in the Synapse environment, in a more dynamic set-up and to a much larger audience. 

Easier access, larger audience

With this new advanced analytics platform, business analysts can also access all the data and create their own reports, even from Excel. This setup also opens the door to create an additional data layer for third party companies (like vendors who are selling their -typically beverage- products in these shops), which can even be sold as a service, to create an additional stream of revenue.

Scalability and reduced infrastructure costs

Another huge advantage is that the new platform allows our client to dynamically scale compute resources up or down depending on their needs. This means a significantly better solution in terms of cost effectiveness.

Found this post interesting?

Subscribe to our newsletter to receive updates of similar post and news!

Follow us for more news and technical insights!

Please provide your basic info to view the Demo

Download Whitepaper on Rapid Smart Excel Add-In