The Complete Guide to Direct Lake in Power BI
Quick Summary: Direct Lake is Microsoft Fabric’s innovative storage mode that combines the performance of Import mode with the data freshness of DirectQuery, enabling fast interactive analysis on massive datasets stored in OneLake without full data imports.
What is Direct Lake?
Direct Lake is a revolutionary storage mode in Power BI that fundamentally changes how large datasets are processed and analyzed. Instead of importing data into Power BI (which creates duplicates and storage overhead) or querying live databases directly (which can be slow), Direct Lake connects directly to data stored in OneLake – Microsoft Fabric’s unified data lake.
Think of it as having the best of both worlds: Import mode performance with DirectQuery freshness, all while eliminating the need for massive data refreshes that consume time and resources.
Key Characteristics
- No Full Data Copy: Data remains in OneLake as optimized Delta tables (Parquet files)
- On-Demand Loading: Only the specific columns needed for queries are loaded into memory
- Built for Scale: Works with datasets larger than available memory
- Automatic Updates: Syncs with source data changes through lightweight “framing”
- Enterprise-Ready: Designed for IT-driven projects with big data lakes and warehouses
How Direct Lake Works: The Technical Breakdown
Understanding the mechanics behind Direct Lake helps appreciate why it’s such a game-changer for Power BI performance at scale.
Core Architecture
Direct Lake operates on a simple but powerful principle: query data where it lives. Here’s how it works:
- Data Storage: All data resides in OneLake as Delta tables (optimized Parquet files with transaction logs)
- Query Processing: The VertiPaq engine (same as Import mode) processes queries
- Selective Loading: Only the specific columns required for a query are loaded into memory
- Data Preparation: All ETL and data transformation happens in OneLake using Spark, T-SQL, or pipelines
- Automatic Synchronization: Changes in source data are automatically detected and reflected
Pro Tip: The “framing” process in Direct Lake only updates metadata about what data is available, not the actual data itself. This makes refreshes incredibly fast compared to traditional Import mode.
Direct Lake vs. Import vs. DirectQuery: A Detailed Comparison
Choosing the right storage mode is crucial for Power BI performance. Here’s how Direct Lake stacks up against traditional options:
| Feature | Import Mode | DirectQuery | Direct Lake |
|---|---|---|---|
| Data Copy | Full copy into Power BI | No copy, queries source directly | No full copy, loads needed data only |
| Performance | Fast (in-memory processing) | Slower (depends on source database) | Fast (Import-like performance) |
| Refresh Requirements | Heavy & time-consuming | No refresh needed | Light “framing” refresh |
| Data Latency | Can be outdated until refresh | Always latest data | Near real-time (auto updates) |
| Memory Usage | High (entire dataset) | Low (only query results) | Moderate (only needed columns) |
| Best For | Small to medium datasets | Real-time reporting on live data | Large datasets in data lakes |
When to Choose Each Mode
Import Mode: Still excellent for small to medium datasets, self-service analytics, and scenarios where you need Power Query transformations within Power BI.
DirectQuery: Essential when you need absolutely real-time data from operational databases or when data volumes exceed what can reasonably be imported.
Direct Lake: The ideal choice for Fabric/OneLake environments with large datasets where you need both performance and data freshness without the overhead of full refreshes.
Key Benefits of Direct Lake
Direct Lake isn’t just another storage mode – it’s a paradigm shift for enterprise Power BI deployments. Here’s why:
β‘ Blazing Fast Queries
Experience Import mode-like performance while querying data directly from OneLake. The VertiPaq engine processes queries in-memory for lightning-fast responses.
π No More Big Refreshes
Say goodbye to hour-long refresh cycles. Direct Lake uses lightweight “framing” that updates only metadata, making refreshes quick and cost-effective.
π Scales to Massive Datasets
Work with datasets larger than available memory. Direct Lake loads only the columns needed for each query, enabling analysis on terabytes of data.
π Always Fresh Data
Automatic synchronization with OneLake means your reports always reflect the latest data without manual intervention.
π’ Enterprise-Ready Architecture
Built for IT-driven projects with centralized data governance, security, and management in Fabric/OneLake environments.
π° Cost Effective
Reduce storage costs by eliminating data duplication and minimize compute costs with efficient query processing.
Essential Direct Lake Concepts
To effectively work with Direct Lake, you need to understand these fundamental concepts:
1. Column Loading (Transcoding)
Direct Lake doesn’t load entire tables into memory. Instead, it uses transcoding to load only the specific columns needed for a query from Parquet files into the VertiPaq engine. This selective loading is key to its memory efficiency.
2. Framing Refresh
Unlike traditional refreshes that move all data, framing only updates metadata about what data is available in the Delta tables. It’s like updating the table of contents without rewriting the entire book.
3. Automatic Updates
Direct Lake models automatically stay synchronized with changes in OneLake. When data is added, modified, or deleted in the source Delta tables, the semantic model detects these changes and updates accordingly.
4. DirectQuery Fallback
In certain scenarios (like when using SQL views or encountering security rules), Direct Lake may fall back to DirectQuery mode. While this ensures queries still work, performance will be slower, so it’s best to avoid fallback when possible.
Important: DirectQuery fallback can significantly impact performance. Always monitor your queries to ensure they’re running in true Direct Lake mode for optimal speed.
Security in Direct Lake
Security is paramount in enterprise deployments. Direct Lake offers flexible security options:
Authentication Methods
- Single Sign-On (SSO): Default method where users access data with their own identities. Users need appropriate permissions on the underlying Fabric items.
- Fixed Identity: Uses a shared service account for all users. Ideal when you want to centralize security management or when users shouldn’t directly access source data.
Security Layers
Direct Lake supports multiple security approaches depending on your needs:
| Security Type | Description | Performance Impact | Best For |
|---|---|---|---|
| Row-Level Security (RLS) | Controls which rows users can see based on their identity | Minimal if implemented in semantic model | Sales territories, department views |
| Object-Level Security (OLS) | Controls visibility of entire tables or columns | Can cause query errors in visuals | Hiding sensitive tables/columns |
| Column-Level Security (CLS) | Controls access to specific columns | Similar to OLS – can cause issues | Protecting sensitive columns like salaries |
Best Practice: Implement RLS at the semantic model level with fixed identity for best performance. Avoid OLS/CLS when possible as they can cause report errors.
Capacity Requirements and Licensing
Direct Lake requires specific Fabric capacities and licenses:
- Fabric Capacity License: Requires Fabric capacity SKUs (F-series). No support for Power BI Pro/PPU standalone
- Memory Considerations: Larger SKUs provide more memory, supporting larger data volumes and more concurrent users
- Performance Scaling: If capacity limits are exceeded, queries may slow down or fall back to DirectQuery
- Region Requirements: Semantic model and lakehouse must be in the same Azure region
Current Limitations (As of Preview)
While powerful, Direct Lake has some limitations to be aware of:
- Data Type Restrictions: No support for complex data types like binary or GUID
- Composite Model Limitations: Limited support for combining Direct Lake with other storage modes
- Desktop/Web Restrictions: Some Power BI Desktop and web modeling features not yet available
- Geographic Constraints: Model and lakehouse must reside in the same Azure region
- Preview Features: Some capabilities are still in preview and may change
Understanding and Optimizing Direct Lake Query Performance
Direct Lake performance isn’t just about the semantic model design – it’s deeply connected to how your Delta tables are structured and maintained in OneLake. Let’s dive into the factors that make or break Direct Lake performance.
Core Principle: Well-optimized Delta tables = Faster query execution. Your semantic model is only as fast as the data it reads from OneLake.
Fundamental Building Blocks
To understand performance optimization, you need to grasp these key concepts:
- Delta Tables in OneLake: Store data in Parquet files with a transaction log (Delta log) for ACID compliance
- Parquet Files: Columnar storage format where data is organized in row groups and column chunks
- V-Order Optimization: Microsoft’s proprietary optimization that rearranges data in Parquet files for better compression and faster queries
- VertiPaq & VertiScan: The storage engine that loads and processes data in memory
- Dictionary Encoding: Stores unique values as integers to save space and accelerate queries
Key Factors Affecting Direct Lake Performance
1. V-Order Compression
V-Order is Microsoft’s secret sauce for Parquet optimization. It provides:
- Better compression ratios (smaller file sizes)
- Faster data loading into memory
- Ability to run queries directly on compressed data
Implementation: Enable V-Order when writing Delta tables: spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
2. Data Type Optimization
Choose data types wisely:
- Use integers instead of strings where possible (faster processing, better compression)
- Avoid storing numbers as text (increases storage, slows queries)
- Use appropriate date/time types rather than strings
3. Segment Size Management
VertiPaq divides data into segments for processing. Optimal configuration:
- Ideal Range: 1M β 16M rows per segment
- Avoid: Too many small segments (slows queries)
- Also Avoid: Extremely large segments (memory pressure)
4. Column Cardinality Considerations
Cardinality (number of unique values) significantly impacts performance:
- High Cardinality: Too many unique values slows performance
- Manageable Values: Keep unique values reasonable (avoid IDs with millions of unique entries as dimension keys)
- Low-Cardinality Benefits: Excellent dictionary encoding efficiency
5. Memory Residency States
Direct Lake data moves through different memory states:
| State | Description | Performance | Typical Scenario |
|---|---|---|---|
| Cold | No data in memory, must load from Parquet | Slowest | First query after restart |
| Semiwarm | Only updated parts need reloading | Moderate | After incremental updates |
| Warm | Data already in memory | Fast | Recent queries |
| Hot | Data in memory + query cache | Fastest | Frequently queried data |
Bootstrapping: The Cold Start Process
When a Direct Lake model first loads (cold state), it goes through bootstrapping:
- Dictionary Merging: Combines Parquet dictionaries into a global VertiPaq dictionary
- Column Loading: Transcodes column chunks into column segments
- Index Building: Creates join indexes for table relationships
- Deletion Handling: Applies deletion vectors to ignore deleted rows
Performance Note: Queries in cold state are slower but improve dramatically as the model warms up. Consider “warming” critical models with initial queries.
Delta Table Best Practices for Direct Lake
Following these guidelines will ensure optimal Direct Lake performance:
β What You SHOULD Do
- Use optimal row groups: Target 1M β 16M rows per row group
- Keep Parquet file count reasonable: Fewer larger files (but balanced for parallel reads)
- Regular Spark Optimize: Run regularly to merge small files
- Low-cardinality partitioning: Use columns like Month, Region, Department (under 200 unique values)
- Append-only updates: Perfect for incremental framing
- Enable V-Order: Always use V-Order compression for Fabric workloads
β What You SHOULD Avoid
- Too many small files: Creates excessive column segments, slowing queries
- Overwrite updates: Resets Delta log, forces full reload
- High-cardinality partitioning: Daily partitions with 1000+ values cause fragmentation
- Using views: Causes DirectQuery fallback (slower performance)
- Complex data types: Binary, GUID, and other unsupported types
Update Patterns and Their Performance Impact
1. Batch Updates Without Partitioning
Performance: Poor
Why: Deleting small chunks affects most files, causing widespread rewrites
2. Batch Updates With Partitioning
Performance: Good
Why: Deletes only touch partitioned files, minimizing impact
Tip: Keep partition columns under 200 unique values
3. Incremental Loading (Append-Only)
Performance: Excellent
Why: Perfect alignment with incremental framing
Best For: Fact tables, transaction data, logs
4. Stream Processing (Real-Time)
Performance: Variable
Challenge: Can create too many small files
Solution: Regular optimization (Spark Optimize) to merge files
Essential Maintenance Operations
Regular maintenance keeps your Direct Lake environment performing optimally:
1. Vacuuming
Cleans up old Parquet files no longer referenced by the Delta log.
Important: Ensure commit versions still exist until model refresh completes.
2. Spark Optimize
Merges small Parquet files into larger, more efficient files.
Schedule: Run regularly (e.g., weekends or during low-usage periods)
Benefit: Improves query speed and reduces metadata overhead
3. Delta Analyzer Tool
Use to monitor and analyze:
– Row group sizes
– File counts
– Update history
– Partition effectiveness
Ideal Direct Lake Setup Summary
- Row Group Size: 1M β 16M rows
- Partitioning: Low cardinality columns (<200 unique values)
- Updates: Append-only pattern
- Optimization: Regular Spark Optimize runs
- Memory: Keep models in warm/hot state for frequent queries
- Monitoring: Regular performance checks
Remember the Performance Formula:
Direct Lake Performance = Good Semantic Model + Efficient Delta Tables + Proper Updates + Memory Residency
Monitoring and Troubleshooting Performance
Tools for Performance Analysis
| Tool | Purpose | Best For |
|---|---|---|
| Performance Analyzer (Power BI Desktop) | Quick checks of query execution mode | Identifying DirectQuery fallback |
| SQL Server Profiler (SSMS) | Detailed trace of query events | Deep performance analysis |
| DAX Studio (Community Tool) | Advanced query performance analysis | Query optimization |
| Azure Log Analytics | Collect and analyze telemetry | Enterprise monitoring |
| Delta Analyzer | Delta table health checks | Table optimization |
Analyzing Query Processing
To check if queries run in Direct Lake or fall back to DirectQuery:
Method 1: Performance Analyzer (Quick Check)
- Open Power BI Desktop β New Report
- Get Data β Power BI semantic models
- Select your Direct Lake model
- Add a visual to the report
- View β Performance Analyzer β Start recording
- Refresh visual and expand results
Interpretation: If you see DirectQuery activity, fallback is occurring. Pure Direct Lake shows only VertiPaq events.
Method 2: SQL Server Profiler (Detailed Analysis)
- Open SQL Server Profiler (included with SSMS)
- File β New Trace β Connect to Analysis Services
- Server name: Your Power BI workspace URL
- Trace Properties β Events Selection β Show all events
- Select Query Processing events:
- DirectQuery_Begin / DirectQuery_End
- VertiPaq_SE_Query_Begin / VertiPaq_SE_Query_End
- VertiPaq_SE_Query_Cache_Match / Cache_Miss
- Run trace and interact with Power BI report
Interpretation: VertiPaq events indicate Direct Lake processing. DirectQuery events indicate fallback to slower mode.
Common Performance Issues and Solutions
Issue: DirectQuery Fallback
Causes:
- Using SQL views instead of tables
- RLS implemented at SQL endpoint level
- Memory pressure exceeding limits
- Unsupported data types or functions
- Replace views with materialized tables
- Implement RLS at semantic model level
- Monitor and increase capacity if needed
- Check for unsupported features
Issue: Slow Cold Queries
Causes: First query after model load or refresh
Solutions:
- Implement cache warming (run initial queries proactively)
- Schedule refreshes during off-hours
- Keep frequently used models in warm state
Issue: Memory Pressure
Causes: Too much data loaded into memory
Solutions:
- Optimize column selection (avoid SELECT *)
- Implement aggregation tables
- Upgrade to larger capacity SKU
- Use DirectLakeOnly mode to prevent fallback to memory-intensive operations
Key Takeaway: Direct Lake delivers Import mode performance with DirectQuery freshness, but requires proper Delta table optimization and monitoring to achieve its full potential.
Working with Direct Lake in Power BI Desktop
Power BI Desktop offers powerful capabilities for working with Direct Lake semantic models, with some unique considerations compared to traditional Import or DirectQuery models.
What Makes Desktop Different for Direct Lake?
When working with Direct Lake tables in Power BI Desktop, you’re engaging in remote modeling:
- No Local Engine: The local Power BI engine cannot run Direct Lake models
- Fabric Connection Required: Desktop connects to your Fabric workspace for all modeling operations
- Live Editing: Changes apply immediately to the cloud semantic model
- Local Metadata: You can save a local copy (metadata + reports) in PBIP files
Power BI Project (PBIP) vs. Traditional PBIX
| Aspect | Import Tables (PBIX) | Direct Lake Tables (PBIP) |
|---|---|---|
| Data Storage | Downloaded to your local PC | Stays in OneLake (cloud) |
| Performance | Depends on local PC specs | High (optimized for OneLake) |
| File Contents | Contains data + metadata | Metadata only (no data) |
| File Size | Can be very large | Small (KB range) |
| Version Control | Challenging (binary format) | Git-friendly (text-based) |
Creating and Editing Direct Lake Semantic Models
Creating a New Semantic Model
- Open Power BI Desktop
- Select OneLake catalog from Get Data options
- Choose a Fabric item (Lakehouse or Warehouse)
- Connect and authenticate
- Name your semantic model and select workspace
- Choose tables to include β Click OK
Note: The model is created directly in your Fabric workspace, not on your local machine.
Adding Additional Tables
To add more tables to an existing Direct Lake semantic model:
- While in live editing mode, open OneLake catalog again
- Pick another Fabric item (Lakehouse or Warehouse)
- Select the additional tables needed
- Click OK to add them to your semantic model
Editing Existing Semantic Models
Three methods to edit Direct Lake semantic models:
Method 1: From Power BI Desktop
- Open Power BI Desktop β OneLake catalog
- Select your semantic model
- Click Connect β Edit
- You’re now in live editing mode
Method 2: From PBIP File
- Open a previously exported PBIP file
- Power BI Desktop connects to the remote semantic model
- Begin editing with live connection
Method 3: From Fabric Portal
- Navigate to your semantic model in Fabric
- Select “Edit in Desktop” from context menu
- Desktop opens with live connection
Key Differences in Live Editing Experience
Working with Direct Lake models in Desktop has some unique characteristics:
- Report View: Removed unless using PBIP files
- Table View: Only available for calculated tables
- Auto-Save: Changes save directly to Fabric (no Save button)
- Collaboration: Multiple editors supported with automatic sync, but conflicts may require rework
- No Local Publishing: Cannot publish directly from Desktop β Use Git integration or APIs
Refresh Behavior in Desktop
Understanding refresh is crucial for Direct Lake management:
- Refresh Button: Performs schema refresh (checks for column/table changes)
- Scheduled Refresh: In Fabric, only re-frames tables (updates metadata)
- Schema Changes: Source changes (column renames) may drop columns unless updated via TMDL
- Automatic Updates: Enabled by default to sync with Delta table changes
Power BI Project (PBIP) Deep Dive
What is PBIP?
Power BI Project files (PBIP) are the recommended way to work with Direct Lake semantic models:
- Local Storage: Stores metadata and reports locally
- Remote Connection: Connects to Fabric workspace for live editing
- Git Integration: Perfect for version control with text-based files
- Deployment Pipelines: Enables safe movement from dev β test β prod
Exporting a Power BI Project
- In Power BI Desktop: File > Export > Power BI Project
- Choose save location (default: %USERPROFILE%\Microsoft Fabric\repos\[Workspace Name])
- Export includes:
- Semantic model files (.pbi)
- Report definition files
- Configuration files
- Open new Desktop instance β Open exported PBIP to continue editing
Editing and Saving in PBIP
- Local Editing: Edit reports locally within PBIP structure
- Save Button: Saves metadata + reports to PBIP folder
- Publish Disabled: Cannot publish directly from Desktop
- Deployment: Use Fabric Git Integration for publishing
Advanced Views and Capabilities
TMDL View
Allows direct editing of the semantic model script in a text-based format. Essential for:
- Bulk model changes
- Migration between data sources
- Advanced customization
DAX Query View
Run and test DAX calculations directly against your semantic model. Both views are only saved when working with PBIP files.
Migrating Between Direct Lake Sources
To migrate from Direct Lake on SQL to Direct Lake on OneLake:
- Use TMDL view to update connection string
- Change from SQL endpoint β OneLake connection
- Note: OneLake supports multiple data sources (unlike SQL)
- Always backup before migration
Requirements and Limitations
Essential Requirements
- XMLA Endpoint: Must be enabled in workspace
- Write Permission: On semantic model
- Viewer Permission: On Lakehouse/Warehouse
- Fabric License: Not available for free license users
- Same Region: Model and lakehouse must be in same Azure region
Current Limitations (Preview)
- No Direct Publishing: Cannot publish PBIP from Desktop β must use Fabric Git or APIs
- RLS Validation: Cannot validate Row-Level Security roles in Desktop
- No Sign-Out: Cannot sign out during live editing sessions
- Feature Gaps: Some features like barcode filters and external sharing not yet supported
- Preview Status: Features may change before general availability
Direct Lake Web Modeling (Preview)
Microsoft Fabric now enables building and editing semantic models directly in your browser through web modeling.
Two Paths to Web Modeling
π€ Direct Lake on OneLake
Creation Path:
Create β OneLake catalog β New semantic model from Lakehouse
Best For:
Lakehouse-centric architectures
π’ Direct Lake on SQL
Creation Path:
SQL analytics endpoint β Reporting β New semantic model
Best For:
Warehouse-centric architectures
Key Point: Both paths allow editing in the browser AND switching to Power BI Desktop for deeper work.
Creating Semantic Models in the Web
For OneLake Models:
- Go to Create β OneLake catalog β Fabric item
- Or: Open Lakehouse β New semantic model
- Enter model name
- Choose Fabric workspace
- Select tables β Click OK
- Model opens in web modeling editor
For SQL Models:
- Open SQL analytics endpoint or Warehouse
- Go to Reporting β New semantic model
- Enter model name
- Choose Fabric workspace
- Select tables β Click OK
- Model opens in web modeling editor
Troubleshooting: If editor doesn’t open, check your browser’s pop-up blocker settings.
Editing Existing Models in Web
- Go to Fabric Portal
- Find your model via Home, OneLake catalog, or search
- Select model β Open details page or context menu
- Choose “Open data model”
- Switch from view mode to editing mode (top-right)
- Optional: Select “Edit in Desktop” for Power BI Desktop
Editing Tables for Direct Lake Semantic Models
The “Edit tables” dialog is your control center for managing which Direct Lake tables your semantic model includes.
Purpose of Edit Tables
- Add/Remove Tables: Include or exclude tables from your model
- Source Requirement: Tables must come from one Fabric item (Lakehouse or Warehouse)
- Schema Sync: Apply column changes from data source
Edit Tables Dialog Components
- Title: Indicates create vs edit mode
- Info Links: Documentation and help
- Workspace Links: View source Fabric item
- Search Box: Find tables/views quickly
- Filters: Narrow by schema or object type
- Reload: Refresh schema (needs write permission)
- Tree View: Schema β Type β Table/View names
- Checkboxes: Select/unselect tables
- Confirm/Cancel: Apply or discard changes
Important Considerations
- Name Changes: You can rename tables/columns in the semantic model, but Edit tables shows original source names
- Source Renames: If upstream renames a table/column, the model looks for old name β table disappears after sync
- Renamed Tables: Show as unchecked in Edit tables β must re-add them
- Preservation: Measures can be moved, but relationships and column properties must be reapplied
Opening Edit Tables
Web Modeling (Browser):
- Ribbon button β Edit tables
- Right-click table β Edit tables
- If disabled: Try selecting only one table
Power BI Desktop:
- Right-click table β Edit tables
Relationships in Direct Lake Mode
Creating and managing relationships in Direct Lake has some unique characteristics:
- No Data Preview: Unlike Import/DirectQuery, no automatic data validation
- Cardinality Guessing: Based on row counts (bigger table = “many” side)
- Cross-Filter Direction: Always single by default (can change manually)
- Validation Methods: Use DAX queries in DAX view or create visuals using both tables
Creating Relationships
- Drag & Drop: Column β column in Model view
- Manage Relationships: Ribbon button
- New Relationship: Data pane β Model Explorer
Editing Relationships
- Select relationship line in diagram β edit in Properties pane
- Double-click line β open relationship editor
Limitations to Note
- β No data preview in relationship dialog
- β No auto validation of cardinality
- β No auto validation of cross-filter direction
- β Full control over syncing schemas and renaming
- β Ability to create relationships manually
Summary: Edit tables in Direct Lake gives you complete control over which tables/views are in your semantic model, with full schema sync and relationship management capabilitiesβjust without the automatic previews and validations of Import/DirectQuery.
Creating a Lakehouse for Direct Lake
Let’s walk through the complete process of setting up a Lakehouse, adding Delta tables, and building a Direct Lake semantic model in Microsoft Fabric.
Prerequisites: Before starting, ensure you understand Direct Lake basics and have appropriate Fabric permissions. Direct Lake works best with Delta tables using V-Order compression.
Step 1: Create a Lakehouse
- In your Microsoft Fabric workspace, navigate to: New > More options > Data Engineering > Lakehouse
- In the New lakehouse dialog:
- Enter a name (letters, numbers, underscores only)
- Click Create
- Verify the lakehouse is created and opens successfully
Step 2: Create Delta Tables with V-Order Compression
Direct Lake requires Delta tables. Here’s how to create optimized tables:
Best Practice: Always use V-ORDER compression for faster Direct Lake performance.
- In your lakehouse: Select Open notebook > New notebook
- Copy and run this code to access Azure storage:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r"""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token
)
print('Remote blob path: ' + wasbs_path)
Expected: Output showing remote blob path
- Read Parquet file into DataFrame:
df = spark.read.parquet(wasbs_path)
print(df.printSchema())
Expected: Schema of the data displayed
- Save as Delta table with V-ORDER compression:
# Save as delta table with V-ORDER compression
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
df.write.format("delta").saveAsTable("holidays")
Verification: Wait for SPARK jobs to complete, then refresh the Tables node to confirm your table exists.
Step 3: Create Your Direct Lake Semantic Model
- In your lakehouse: Select New semantic model
- In the dialog:
- Choose which tables to include
- Select Confirm
- A Direct Lake model is automatically created (same name as lakehouse)
- Open the data model in Power BI Web modeling:
- Add relationships between tables
- Add DAX measures for calculations
Next Steps After Creation
- Build Reports: Create Power BI reports on your semantic model
- Composite Models: Combine with other data sources if needed
- XMLA Endpoints: Enable for programmatic access
- Security: Implement RLS/OLS as required
Key Takeaways:
β’ Lakehouse = Central storage for structured/unstructured data in Fabric
β’ Delta tables = Required for Direct Lake, optimize with V-ORDER compression
β’ Semantic model = Makes lakehouse data usable in Power BI
β’ Direct Lake = Fast in-memory queries without data duplication
Developing Direct Lake Semantic Models
What is a Direct Lake Semantic Model?
A semantic model is Power BI’s data modeling layer that defines:
- Tables and their relationships
- Measures and calculations
- Security rules (RLS/OLS)
- Business logic and semantics
Direct Lake mode makes this layer high-performance by querying large data directly from OneLake without importing everything.
Creation Methods
You can create Direct Lake semantic models through multiple interfaces:
| Method | Best For | Notes |
|---|---|---|
| Power BI Desktop | Power users, developers | Full modeling capabilities |
| Browser (Web Modeling) | Quick edits, administrators | Direct from Lakehouse/Warehouse |
| SSMS (19.1+) | Database professionals | Via XMLA endpoint |
| Fabric Notebooks | Programmatic creation | Automation, CI/CD |
| Community Tools | Advanced scenarios | Tabular Editor, DAX Studio |
Model Tables Best Practices
Critical: Avoid using views in Direct Lake modelsβthey cause queries to fall back to DirectQuery mode (slower performance).
Optimal table structure includes:
- Essential Columns: For filtering, grouping, sorting, summarizing
- Relationship Columns: Support table relationships
- Avoid Unnecessary Columns: Don’t slow queries but increase storage
- No Dynamic Data Masking (DDM): Not supported in Direct Lake
Supported Table Types
- Direct Lake tables (from SQL analytics endpoint)
- Import tables (stored on OneLake)
- Calculated tables (but NOT referencing Direct Lake tables)
- Calculation groups
Enforcing Data-Access Rules
Different users often need different data views. Direct Lake supports multiple security approaches:
1. Object-Level Security (OLS)
Purpose: Control visibility of tables/columns
Example: Hide “Salary” column from non-HR users
Setup: SQL analytics endpoint or semantic model (via Tabular Editor)
Caution: Can cause report errors in visuals
2. Row-Level Security (RLS)
Purpose: Control which rows users can see
Example: Salespeople only see their region’s sales
Setup Options:
- SQL analytics endpoint (causes DirectQuery fallback)
- Semantic model (web modeling or third-party tools)
How Queries Are Evaluated
Direct Lake follows this decision path for each query:
- If query hits restricted OLS column/table β Error
- If restricted by SQL endpoint CLS β Error
- If query uses a view or table with RLS at endpoint β DirectQuery fallback
- If query exceeds memory limits β DirectQuery fallback
- Otherwise β In-memory cache (fastest)
Source Item Permissions
Access depends on connection type:
| Connection Type | Account Used | Permissions Needed |
|---|---|---|
| SSO (Default) | Report consumer’s identity | Read + ReadData on source item |
| Fixed Identity | Shared service account | Single account needs permissions |
Security Implementation Options
| Where Rules Are Applied | Performance Impact | Best For |
|---|---|---|
| Semantic model only | Best (in-memory) | Most scenarios; use fixed identity |
| SQL endpoint only | Slower (DirectQuery fallback) | When users also access warehouse directly |
| Both layers | Complex, variable | Advanced security requirements |
Best Practice: Implement RLS in the semantic model with fixed identity for optimal performance. Avoid OLS/CLS when possible as they can cause visualization errors.
XMLA Endpoint for Advanced Management
Direct Lake semantic models support XMLA write operations, enabling:
- Customization & Scripting: Via SSMS, PowerShell, or custom tools
- CI/CD Integration: With GitHub or Azure DevOps
- Automation: Refresh schedules, model updates
- Debugging: Advanced troubleshooting capabilities
Important: After creating new tables via XMLA, you must refresh the model. Otherwise, queries will fall back to DirectQuery.
Direct Lake Model Metadata
- Compatibility Level: 1604
- Partition Mode: directLake
- Schema Source: SQL analytics endpoint
- Data Source: OneLake (direct loading)
Managing Direct Lake Semantic Models
Post-Publication Checklist
After publishing a Direct Lake semantic model, complete these essential tasks:
- Cloud Connection Setup: Configure secure data connectivity
- Security Role Management: Define RLS membership
- Fabric Item Permissions: Assign Read/Build access
- Refresh Configuration: Set up data updates
- Data Discovery: Enable endorsement for visibility
Cloud Connection Types
π Default Cloud Connection
Authentication: SSO (user’s identity)
Best For: When all users have direct data access
Permissions: Users need Read/ReadData on source
π’ Sharable Cloud Connection
Authentication: Fixed identity (service principal)
Best For: Centralized security management
Permissions: Single account needs access
Authentication Methods
- OAuth 2.0: User account-based (not recommended for production)
- Service Principal: Recommended for enterprise:
- Secure, not tied to individual users
- Supports secret/certificate rotation
- Centralized management
Permission Requirements
| User Scenario | Permission Needed | Notes |
|---|---|---|
| View reports only | Read (report + semantic model) | Basic consumption |
| Create new reports | Build (semantic model) | Self-service BI |
| Manage model (refresh, settings) | Owner | Administration |
| Query with fixed identity | No lakehouse permission needed | Service account has access |
| Query with SSO | Read/ReadData on lakehouse | User needs direct access |
Testing Tip: Always test permissions thoroughly before deploying to production users.
Refresh Strategies for Direct Lake
Refreshing updates your model with new data from source Delta tables:
Refresh Methods
- Manual Refresh: Via Fabric portal or SSMS scripts
- Scheduled Refresh: Automatic at defined intervals
- Programmatic Refresh: Triggered via API or ETL pipelines
- Automatic Updates: Enabled by default; detects Delta table changes
Pro Tip: Disable automatic updates if you need ETL jobs to complete before exposing new data to users.
Cache Management
After refresh, memory cache may be cleared β initial queries can be slow.
Cache Warming Strategy: Run critical queries immediately after refresh to load important data into memory. Only needed if query delays are unacceptable for users.
Direct Lake Behavior Property
Control how queries are processed:
| Setting | Behavior | Use Case |
|---|---|---|
| Automatic (Default) | Falls back to DirectQuery if needed | General use, ensures queries work |
| DirectLakeOnly | Strict in-memory only; no fallback | Performance-critical scenarios |
| DirectQueryOnly | Always uses DirectQuery | Testing, troubleshooting |
Recommendation: Use DirectLakeOnly for production if you want guaranteed in-memory performance and have addressed all potential fallback causes.
Specifying a Fixed Identity for Direct Lake
By default, Direct Lake uses SSO (user identity). For centralized security, you can configure a fixed identity:
Step-by-Step Configuration
- Access Model Settings:
- Open your Direct Lake model settings
- Expand “Gateway and cloud connections”
- Note the SQL Server data source pointing to your lakehouse/warehouse
- Create New Connection:
- In “Maps to” listbox, click “Create a connection”
- New connection pane opens with pre-filled details
- Enter meaningful name (e.g., “FixedIdentityConnection”)
- Set Authentication Method:
- Choose: OAuth 2.0 (user account) or Service Principal (recommended)
- Enter credentials for the fixed identity
- Disable SSO:
- Go to Single sign-on section
- Ensure “SSO via Microsoft Entra ID for DirectQuery queries” is NOT selected
- Complete Setup:
- Configure any additional parameters
- Click “Create”
- Verify Configuration:
- Confirm data source now maps to non-SSO connection
- Test model connectivity
Quick Recap: Model Settings β Gateway and cloud connections β Create connection β Choose authentication β Disable SSO β Create & Verify.
Building Reports with Direct Lake Semantic Models
Understanding the Semantic Model Role
Think of the semantic model as the “brain” of your Power BI reports:
- Organizes data for reporting, exploration, and DAX queries
- With Direct Lake, reads directly from OneLake Delta tables
- Maintains cache for recently used data
- Automatically falls back to DirectQuery if needed
Creating Reports in Power BI Desktop
- Open Power BI Desktop
- Select: OneLake catalog or Get data β Power BI semantic models
- Choose your Direct Lake semantic model β Connect
- You’re now live-connected β start building visuals
- Save locally β Publish to workspace to share
Note: You can create additional measures in your report without modifying the underlying semantic model.
Creating Reports in Power BI Service / Fabric Portal
Multiple browser-based options:
- Semantic model menu (…) β Create report
- Home β New report β Pick published semantic model
- OneLake catalog β Data & Semantic model β Create report
- Semantic model details page β Create blank report
- Web modeling (File β Create new report)
Additional Reporting Options
π Explore
Quick, interactive analysis
π Paginated Reports
Pixel-perfect, printable reports
π DAX Queries
Advanced calculations
π Excel + Power BI
Refreshable PivotTables
Consumption Scenarios and Permissions
Scenario 1: View Report Only
Approach: Use Fixed Identity
Options:
- Publish as App (report-only permissions)
- Give Viewer role on report + semantic model
- Separate workspace with Viewer role
Scenario 2: View + Create Own Reports
Approach: Use Fixed Identity
Permissions: Build permission on semantic model + Viewer on report
Scenario 3: View + Create + SQL Endpoint + Delta Access
Approach: Use SSO
Permissions Required:
- Build access on semantic model
- ReadAll permission on Fabric item
- Direct lakehouse/warehouse access
Scenario 4: View + Edit Everything
Approach: Use SSO
Permissions: Higher workspace roles (Contributor, Member, Admin)
Analogy:
β’ Semantic model = Kitchen (where food is prepared)
β’ Reports = Meals served (visuals you consume)
β’ Permissions = Access rules (who can eat, cook, or change recipes)
Final Key Takeaways
Direct Lake: The Future of Power BI at Scale
- β Performance: Import-mode speed with DirectQuery freshness
- β Scalability: Works with datasets larger than memory
- β Efficiency: No full data copies, lightweight refreshes
- β Freshness: Near real-time data synchronization
- β Cost-Effective: Reduced storage and compute requirements
- β Enterprise-Ready: Built for Fabric/OneLake ecosystems
Direct Lake represents a fundamental shift in how Power BI handles large datasets, combining the best aspects of traditional storage modes while eliminating their limitations. For organizations investing in Microsoft Fabric and OneLake, Direct Lake is the clear path forward for scalable, high-performance business intelligence.
Getting Started Recommendation: Begin with a pilot project to validate Direct Lake fits your use case. Focus on Delta table optimization, monitor for DirectQuery fallback, and gradually expand as you gain experience.