Solving a DISTINCTCOUNT Problem with SCD2 in Power BI

Introduction to DISTINCTCOUNT in Power BI

The DISTINCTCOUNT function in Power BI is a pivotal tool for calculating the unique count of values within a column. This functionality becomes crucial in scenarios requiring the enumeration of unique entities, such as tallying distinct clients from a dataset. DISTINCTCOUNT shines in data analysis, especially when dealing with datasets where simple aggregations like “group by” are inadequate.

The Challange

Faced with the task of creating a Power BI report for a bank client, the objective was to display the amount of money invested in each bank product and the distinct number of clients per product. Ideally, a straightforward “group by” would suffice—if each customer engaged with only one product. However, the banking industry often sees customers utilizing multiple products, presenting a complex problem for data aggregation. 

The challenge was further compounded by the limitations of using PowerPivot and Excel for modeling. These tools, while powerful, have their constraints, primarily regarding memory capacity, which makes importing massive datasets unfeasible. Additionally, the client was interested in analyzing product combinations and different timelines, adding another layer of complexity to the task.

The Solution

After initial attempts to import millions of rows into the model proved unsuccessful, I decided to employ a Slowly Changing Dimension Type 2 (SCD2) format for the model table, followed by the use of Data Analysis Expressions (DAX).

Implementing SCD2 Format for DISTINCTCOUNT

The adoption of SCD2 format significantly improved data efficiency. Traditional methods would record multiple entries for a customer’s product use over time. SCD2, however, consolidates this into a single record, delineated by a start (From_date) and an end date (To_date), thereby compressing the data and enhancing model performance. 

This SCD2 formatted data was then imported as a fact table into the model, linked to the Product dimension but not to the Date dimension to streamline the analysis.

Crafting the DISTINCTCOUNT DAX Measure

To tackle the distinct count challenge, a specific DAX measure was devised, incorporating three variables:

  • VAR min_dat = MIN(Dim_Date[Date]) for the minimum date,
  • VAR max_dat = MAX(Dim_Date[Date]) for the maximum date, and
  • VAR tbl = FILTER(Fact_ClientsSCD2, AND(SCD2_From<= min_dat, SCD2_To >= max_dat) to filter records within the specified timeframe.

The final step involves calculating the distinct count of client IDs for the given period using: CALCULATE(DISTINCTCOUNT(Fact_ClientsSCD2[ClientID]), tbl)

This approach not only effectively solved the initial DISTINCTCOUNT challenge but also offered a scalable and efficient method for analyzing product engagement over various timelines, delivering substantial business value to the client.

To address the distinct count challenge and further elaborate on the application of DISTINCTCOUNT in a SCD2 context, we drew inspiration from this article that explores the nuances of calculating the distinct count of customers in a SCD type 2 scenario using DAX, offering valuable insights into the complexities and solutions related to this topic.

Leveraging these insights, a specific DAX measure was devised, incorporating three variables:

  • VAR min_dat = MIN(Dim_Date[Date]) for the minimum date,
  • VAR max_dat = MAX(Dim_Date[Date]) for the maximum date, and
  • VAR tbl = FILTER(Fact_ClientsSCD2, AND(SCD2_From<= min_dat, SCD2_To >= max_dat) to filter records within the specified timeframe.

The final step involves calculating the distinct count of client IDs for the given period using: CALCULATE(DISTINCTCOUNT(Fact_ClientsSCD2[ClientID]), tbl)

This approach not only effectively solved the initial DISTINCTCOUNT challenge but also offered a scalable and efficient method for analyzing product engagement over various timelines, delivering substantial business value to the client.

Conclusion: Leveraging DISTINCTCOUNT and SCD2 in Power BI

Utilizing the DISTINCTCOUNT function, in conjunction with SCD2 data modeling and a strategic DAX measure, presents a robust solution for navigating complex datasets in Power BI. This methodology simplifies data management and enables in-depth insights into customer interactions and product efficacy, critical for informed decision-making within the banking sector.

If you liked this post, check out some of our other technical blogposts:

Author of the post:

Zsolt Kreisz - Financial Controller in the past, now BI Developer at Abylon Consulting.
Linkedin Profile

Found this post interesting?

Subscribe to our newsletter to receive updates of similar post and news!

Follow us for more news and technical insights!

Please provide your name and email address to download the whitepaper

Please provide your basic info to view the Demo

Download Whitepaper on Rapid Smart Excel Add-In