Near real-time Data Warehousing

Recently one of our existing Rapid Analytics clients formulated a new request. They wanted to be able to have data updates more often than the standard daily updates. To enable this we needed to extend the functionality of our data warehouse framework – Abylon Rapid Analytics.. (From now on this is an out-of-the-box feature of Rapid Analytics)

First, we needed to address a performance issue, because if we transfer data every hour or even every minute, then in the case of huge data tables (more than 100 million number of records) the distinctions, whether the incoming new data does not already exist (INSERT) or is already in our database (UPDATE), may take too long and we would not be able to fit the data update timeframe. (Previously, with the one data upload per day, this process took 20 minutes; hence it did not really affect the whole process.)

To narrow/scale down the target database, we started passing data warehouse filter predicates (filtering parameters) with our framework, which makes it easier to manage large data tables.

Next, we turned our attention to the tabular model. In case of the SSAS (SQL Server Analysis Services) tabular models the smallest processing unit is a partition. So, in order to fit into the new update time-frame, we implemented a dynamic partition management approach. The underlying logic can be customized in an SQL stored procedure, taking into account the associated business logic.

Right now, we have customers where data can arrive to the fact tables only in predefined, specific time-frames (last 5-10-20 days can be updated over time/as time goes by), while for other customers we don’t set up such restrictions, so even 2-3 years old data can be changed retroactively.

In the first case, we break down the old partitions into years, the last 12 months into months, and the last 5/10/20 days into daily partitions, and then dynamically create the new daily partitions as time goes by and MERGE the daily partition that fall out of the 5/10/20 day time-frame into the corresponding monthly partition. (For example if the 4th of May falls out of the 10-day time-frame, the corresponding data will be placed in the monthly partition – in this case the partition for May 2021)

In the second case, we can identify the partitions affected during the data transfer/data upload and reprocess only the partitions where data has changed. This approach is ideal when, apart from exceptional cases, only a few of the partitions are updated during a given data load.

With the new version of Abylon Rapid Analytics our clients can stick to the standard daily update method but they also have the option to update their data warehouse on a more, or even much more regular basis.

If you’re interested in Abylon Rapid Analytics, check out the product page or contact us for more information or an on-line demo!

Found this post interesting?

Subscribe to our newsletter to receive updates of similar post and news!

Follow us for more news and technical insights!

Please provide your name and email address to download the whitepaper

Please provide your basic info to view the Demo

Download Whitepaper on Rapid Smart Excel Add-In