Data Masking in Power BI in 2025
Introduction to Data Masking in Power BI
Data masking is a crucial concept in data security, particularly in environments where sensitive information is handled.
For who doesn’t know what “Data Masking” is, it’s the process of hiding sensitive information from specific audience. In a Data Pipeline can be done in various ways, by replacing data with fictional but realistic data, encryption, scrambling, etc… so that the original Information is not recognizable but the data cleanliness is not jeopardized for further analysis.
This two-part blogpost will discuss the basics of Data Masking Power BI, then, explore data masking solutions using a one-to-one approach within Power BI, highlighting both Direct Query and Import mode semantic models, along with tools and creative techniques for implementing these solutions.
In Power BI, data masking is implemented through techniques that ensure the protection of sensitive data while allowing users to gain valuable insights.
To understand how data masking works we should first go “back to the source”.
How does Power BI get the “data to visualize”
Answering this main question is important, because source level access and source level data masking effects data masking in different ways.
(For those who already work with Power BI, I’ll not tell anything new here)
In Power BI there are 3 main ways that people use to get data from the source. Let’s look at their characteristics and their PROS and CONS.
- Direct Query:
- In this mode, data remains in the source database and is queried in real-time.
- Each time you interact with a report (like filtering or drilling down), Power BI sends queries to the source to retrieve up-to-date data.
- Pros: Real-time data access and no data duplication in Power BI.
- Cons: Performance can vary depending on the source database and network; some Power BI features may be limited.
- Import Mode:
- Data is loaded into Power BI’s in-memory storage. Once imported, all reports and visuals retrieve their data from this in-memory store.
- The data can be refreshed at scheduled intervals to keep it current.
- Pros: Fast performance for analytics and full use of Power BI’s features; better suited for complex computations and visualizations.
- Cons: Data may become outdated between refreshes and can take longer to load large datasets initially.
- Direct Lake:
- This mode is designed to work with large datasets, especially in scenarios using Azure Data Lake Storage.
- It allows you to work with data stored in the lake without needing to import it into Power BI’s memory completely.
- Pros: Combines the benefits of Direct Query (real-time access) with some performance advantages of Import Mode, particularly for large datasets.
- Cons: Requires a structured approach to manage the data and set up proper storage.
Each mode has its specific use cases, and the choice depends on your reporting needs, data latency requirements, and performance considerations.
The report builder:
- The user who has access to the source data
- Prepares the semantic model and report
- Publishes the report and manages it on the PBI service
The report user:
- The user who receives some kind of access to the created Power BI Report/ Semantic model
- Access to the data source is not a must in all the cases
Source Data Masking Inheritance
One of the most frequent questions when we are talking about Power BI with someone who’s just starting to use the tool is:
“Are my ERP/Source access rights working with Power BI”?
Going through the connection types we can say that in case of Direct Query, the access set-ups and data masking solutions done on the source gets inherited to Power BI.
Both the report builder and the report user should authenticate with their Data source credentials.
In case of Import Mode, the access set-ups and data masking solutions done on the source inherited to the semantic model are only the ones of the report builder
Basically, the report user works with the data store in the semantic model, and retrieved by the report builder
In regards of Direct Lake basically we have the Microsoft Fabric Data Lake solution working as “source”, as previously mentioned this allows to have the performance of Import mode, but with the data fetching style of “direct query”
Conclusion
Data masking is a crucial practice in safeguarding sensitive information within organizations. By replacing real data with fictional but realistic substitutes, it ensures that confidential information remains protected from unauthorized access. This not only helps in complying with data privacy regulations but also mitigates the risk of data breaches and cyberattacks. Ultimately, data masking is essential for maintaining data integrity, protecting privacy, and fostering trust in an increasingly data-driven world.
Data Masking can be implemented in various steps of our Data Pipeline, and as above described, can be “further spiced” even on the Data Visualization Layer, like PowerBI.
Hopefully, with this blog post, I was able to emphasize also that, in case of Data Masking In PowerBI, we should consider carefully what we want to achieve, because limitations are in place, and the change management or simply maintenance is not always the easiest…..
closing these thoughts with the most important advice: Document everything!!!!!
Coming in Part II.: Data Masking Solutions in Power BI
The second part of this blogpost (which will be published in the coming weeks) will take a closer look at the available native and non-native data masking solution in Power BI.
Follow our LinkedIn page or sign up for our newsletter so you won’t miss the second part.
Other popular technical BI blogposts:
Author of the post:
Oliver Vetesi - Engagement Manager at Abylon Consulting. Linkedin Profile