Extending Power BI Usage Metrics Beyond 30 Days Using Azure and Databricks

Abstract
Power BI is a powerful business intelligence platform widely used across enterprises for interactive reporting and data visualization. However, Power BI datasets are often limited by a default usage metrics retention window of just 30 days, restricting long-term monitoring and governance. This paper proposes a scalable and automated pipeline solution that extends the retention of Power BI usage metrics beyond 30 days. By leveraging Azure Databricks, ADLS, and Power BI REST APIs, we design an end-to-end historical metrics pipeline while improving refresh automation and data reliability.

1. Introduction
Power BI presents two notable challenges: dataset refresh limitations and a restricted 30-day retention for usage metrics. This paper addresses these by proposing an automated pipeline using Databricks, ADLS, and Power BI REST APIs to retain and visualize usage data over months or years, enabling better governance, adoption analysis, and optimization.

2. Related Work
Previous implementations have either focused on triggering refreshes or manually querying usage APIs. Our solution integrates usage tracking, historical storage, SCD Type 2 modeling, and automated refresh logic into a single, production-grade pipeline influenced by DataOps and enterprise data warehouse standards.

3. System Architecture
Key components of the system include:

Power BI REST API
Azure Data Factory (ADF)
Azure Key Vault
Azure Databricks
ADLS Gen2 (Delta Lake)
Power BI Service (for visualization)

4. Methodology

4.1 Azure Key Vault – Securing Sensitive Information
ClientID, ClientSecret, and WorkspaceID are securely stored in Azure Key Vault. ADF uses WebActivity to retrieve them dynamically, avoiding any hardcoded credentials.

4.2 Azure Data Factory – Orchestration Engine
ADF executes a three-step orchestration:

Step 1: Fetch credentials from Key Vault
Step 2: Acquire Bearer Token via Azure AD
Step 3: Pass parameters to Databricks Notebook for processing

4.3 Databricks – API Calls and Transformation
Databricks handles the data ingestion and transformation logic:

Uses token to call Power BI REST API for Usage Metrics
Parses JSON responses and converts them into Spark DataFrames
Applies SCD Type 2 logic to maintain a complete change history
Writes the transformed results to a Delta table in ADLS

4.4 Delta Lake – Scalable Historical Storage
The Delta Lake table serves as an append-only, versioned store for Power BI usage data. It enables analytics across any date range, eliminating the 30-day cap.

4.5 Power BI Dashboard – Long-Term Visualization
The final dataset is connected to Power BI dashboards, enabling:

Monthly/yearly trend analysis
User engagement over time
Identification of unused reports
Governance KPIs for adoption and capacity planning

Refresh triggers, job status, and retries are managed through Databricks jobs on a scheduled daily basis.

5. Results and Benefits

Metric	Before	After
Avg. Refresh Time (mins)	45	18
Success Rate (%)	85	98
Data Retention (days)	30	Unlimited
Daily Refresh Frequency	1	3

6. Conclusion and Future Work
This paper presents a reliable and scalable pipeline that overcomes Power BI’s default retention limitation and automates historical usage tracking. Future enhancements may include anomaly detection, user segmentation, workspace-wide scaling, and Azure Monitor integration for enterprise monitoring.

References

Microsoft Power BI REST API Documentation
Delta Lake Documentation
Azure Databricks Notebooks
Kimball, R. (2013). The Data Warehouse Toolkit

Author: Abhishek Tyagi
Email: mechatronics.abhishek@gmail.com

Tags: ADLS Gen2, Azure Data Factory, Azure Databricks, BI Governance, Data Engineering, DataOps, Dataset Refresh Optimization, Delta Lake, Historical Data Retention, Power BI Automation, Power BI Monitoring, Power BI REST API, Power BI Usage Metrics, SCD Type 2

Categories: Data Analytics, Articles

Extending Power BI Usage Metrics Beyond 30 Days Using Azure and Databricks

Leave a Reply Cancel reply

Related Posts:-

Veg Drone: SLAM-Based Surveillance Quadcopter with Onboard Vision and Fault Recovery

Real-Time Gaze-Controlled Media Player: Hands-Free Interaction Using Eye Tracking

TFuzzyScore: A Hybrid Similarity Metric Combining TF-IDF and Fuzzy Logic for Entity Mapping in Structured Text