When it’s NOT the right time to build a data warehouse

I was recently asked by one of our clients whether they should start building a data warehouse at one of the companies they acquired or not, and if yes, what are the arguments in favor of building it. This conversation inspired me to write this article.

(Note: I used to work as an economics and business professional before switching to IT and Business Intelligence. In this article I try to build a bridge between economics and IT, so apologies to all the IT professionals if, for the sake of non-IT readers, I simplify or even over-simplify some of the things mentioned here.)

When is NOT the right time to start building a data warehouse?

If you just started creating reports, building a reporting system

If you are just starting to put together your first real reports, it is very likely that even you do not know yet from which data and exactly how you should come up with the numbers you need to present to management. In this case, it is not worthwhile to start dealing with data warehousing or Business Intelligence yet. Most likely, in the process of reporting and analyzing the data, you will already get to a point, when:

 

  • You will unify and simplify, for example by developing a common data dictionary
  • You will do some OLTP-based development, for example unify the data collection (eg by implementing or expanding an ERP system) and move it to a common platform
  • The content of the reports may not be accurate yet, so the purpose of the reports themselves will change constantly.

If you have already started creating reports, but for now a few (let’s say 3-4 complex or 5-6 simple) reports is enough to cover all the tasks required by your company.

All the topics seems to be covered by a few reports and they are flexible enough to manage possible changes, and lastly, changes in the basic data do not need to be immediately synchronized with other systems.

If the size of the data is not very large (and at least one of the previous two scenarios are true)

If the number, size and complexity of the data used in the reports (eg the number of records or rows in a table) is not so large that a more serious optimization (or reporting system upgrade) would be necessary in the period between two reports.

If the frequency of data requests and the size of data related to them is NOT overloading the source systems (eg an ERP system) during the reporting

In terms of technology, BI systems have been brought to life by the fact that systems primarily designed for data entry/input (so-called OLTP systems, such as ERP systems, which have been optimized to make data entry easier for users) have not been optimized to serve a large number of queries. If you do not overload the source systems, i.e there is no significant slowdown during the data entry or data query process, then there is no urgent reason yet to start thinking about data warehouses or BI systems.

When SHOULD YOU start thinking about building data warehouses?

If you are drowning in the sea of Excel and / or Power BI reports

If you create a lot of reports, that vary in complexity, utilized technology (e.g. PowerQuery, Excel transformations, Power BI data modeling), and it is becoming harder and harder to effectively debug them, furthermore, it is hard to synchronize the reports with changes to the source system precisely because of their complexity, you should consider moving toward a standardized solution.

If you need a historical aspect to your corporate data

Sometimes, an OLTP system (e.g., an ERP system) is not suited to put data in a historical perspective, e.g. you can’t simply check when an item number was in stock and when it wasn’t or check which team a former employee was part of or when did he leave the company.

If you need a stable, fast connection for data of different quality, source and depth

Some reports require you to compile data consisting of millions of records (millions of rows) from different source systems (e.g., an SQL database and an Excel, or mass data available through a REST API, etc.). In such cases, especially if process speed is requirement, it is unavoidable to use a more powerful SQL database customized specifically for reporting needs.

The size of the data, the time allowed for report creation, or the frequency of the reports requires it

You process large amounts of data? Or you need reporting with very short time intervals (eg hourly report update)? In both cases you may start to think about investing in a data warehouse.

You need to run queries frequently and / or they are so big in size that they put a heavy load on the source system engine (e.g. a database engine)

In such cases, it is logical to start looking for a suitable data warehouse solution. Also, if the frequency of the reports is a pain point, you might check out our “Near Real-time data warehouse” blog article, that describes in detail how Abylon Rapid Analytics clears this hurdle.

Author of the post:

Zsolt Kreisz - BI Developer at Abylon Consulting
Linkedin Profile

Found this post interesting?

Subscribe to our newsletter to receive updates of similar post and news!

Follow us for more news and technical insights!

Please provide your name and email address to download the whitepaper

Please provide your basic info to view the Demo

Download Whitepaper on Rapid Smart Excel Add-In