The Complete Guide to Direct Lake in Power BI

Quick Summary: Direct Lake is Microsoft Fabric’s innovative storage mode that combines the performance of Import mode with the data freshness of DirectQuery, enabling fast interactive analysis on massive datasets stored in OneLake without full data imports.

What is Direct Lake?

Direct Lake is a revolutionary storage mode in Power BI that fundamentally changes how large datasets are processed and analyzed. Instead of importing data into Power BI (which creates duplicates and storage overhead) or querying live databases directly (which can be slow), Direct Lake connects directly to data stored in OneLake – Microsoft Fabric’s unified data lake.

Think of it as having the best of both worlds: Import mode performance with DirectQuery freshness, all while eliminating the need for massive data refreshes that consume time and resources.

Key Characteristics

No Full Data Copy: Data remains in OneLake as optimized Delta tables (Parquet files)
On-Demand Loading: Only the specific columns needed for queries are loaded into memory
Built for Scale: Works with datasets larger than available memory
Automatic Updates: Syncs with source data changes through lightweight “framing”
Enterprise-Ready: Designed for IT-driven projects with big data lakes and warehouses

How Direct Lake Works: The Technical Breakdown

Understanding the mechanics behind Direct Lake helps appreciate why it’s such a game-changer for Power BI performance at scale.

Core Architecture

Direct Lake operates on a simple but powerful principle: query data where it lives. Here’s how it works:

Data Storage: All data resides in OneLake as Delta tables (optimized Parquet files with transaction logs)
Query Processing: The VertiPaq engine (same as Import mode) processes queries
Selective Loading: Only the specific columns required for a query are loaded into memory
Data Preparation: All ETL and data transformation happens in OneLake using Spark, T-SQL, or pipelines
Automatic Synchronization: Changes in source data are automatically detected and reflected

Pro Tip: The “framing” process in Direct Lake only updates metadata about what data is available, not the actual data itself. This makes refreshes incredibly fast compared to traditional Import mode.

Direct Lake vs. Import vs. DirectQuery: A Detailed Comparison

Choosing the right storage mode is crucial for Power BI performance. Here’s how Direct Lake stacks up against traditional options:

Feature	Import Mode	DirectQuery	Direct Lake
Data Copy	Full copy into Power BI	No copy, queries source directly	No full copy, loads needed data only
Performance	Fast (in-memory processing)	Slower (depends on source database)	Fast (Import-like performance)
Refresh Requirements	Heavy & time-consuming	No refresh needed	Light “framing” refresh
Data Latency	Can be outdated until refresh	Always latest data	Near real-time (auto updates)
Memory Usage	High (entire dataset)	Low (only query results)	Moderate (only needed columns)
Best For	Small to medium datasets	Real-time reporting on live data	Large datasets in data lakes

When to Choose Each Mode

Import Mode: Still excellent for small to medium datasets, self-service analytics, and scenarios where you need Power Query transformations within Power BI.

DirectQuery: Essential when you need absolutely real-time data from operational databases or when data volumes exceed what can reasonably be imported.

Direct Lake: The ideal choice for Fabric/OneLake environments with large datasets where you need both performance and data freshness without the overhead of full refreshes.

Key Benefits of Direct Lake

Direct Lake isn’t just another storage mode – it’s a paradigm shift for enterprise Power BI deployments. Here’s why:

⚡ Blazing Fast Queries

Experience Import mode-like performance while querying data directly from OneLake. The VertiPaq engine processes queries in-memory for lightning-fast responses.

🔄 No More Big Refreshes

Say goodbye to hour-long refresh cycles. Direct Lake uses lightweight “framing” that updates only metadata, making refreshes quick and cost-effective.

📊 Scales to Massive Datasets

Work with datasets larger than available memory. Direct Lake loads only the columns needed for each query, enabling analysis on terabytes of data.

🕐 Always Fresh Data

Automatic synchronization with OneLake means your reports always reflect the latest data without manual intervention.

🏢 Enterprise-Ready Architecture

Built for IT-driven projects with centralized data governance, security, and management in Fabric/OneLake environments.

💰 Cost Effective

Reduce storage costs by eliminating data duplication and minimize compute costs with efficient query processing.

Essential Direct Lake Concepts

To effectively work with Direct Lake, you need to understand these fundamental concepts:

1. Column Loading (Transcoding)

Direct Lake doesn’t load entire tables into memory. Instead, it uses transcoding to load only the specific columns needed for a query from Parquet files into the VertiPaq engine. This selective loading is key to its memory efficiency.

2. Framing Refresh

Unlike traditional refreshes that move all data, framing only updates metadata about what data is available in the Delta tables. It’s like updating the table of contents without rewriting the entire book.

3. Automatic Updates

Direct Lake models automatically stay synchronized with changes in OneLake. When data is added, modified, or deleted in the source Delta tables, the semantic model detects these changes and updates accordingly.

4. DirectQuery Fallback

In certain scenarios (like when using SQL views or encountering security rules), Direct Lake may fall back to DirectQuery mode. While this ensures queries still work, performance will be slower, so it’s best to avoid fallback when possible.

Important: DirectQuery fallback can significantly impact performance. Always monitor your queries to ensure they’re running in true Direct Lake mode for optimal speed.

Security in Direct Lake

Security is paramount in enterprise deployments. Direct Lake offers flexible security options:

Authentication Methods

Single Sign-On (SSO): Default method where users access data with their own identities. Users need appropriate permissions on the underlying Fabric items.
Fixed Identity: Uses a shared service account for all users. Ideal when you want to centralize security management or when users shouldn’t directly access source data.

Security Layers

Direct Lake supports multiple security approaches depending on your needs:

Security Type	Description	Performance Impact	Best For
Row-Level Security (RLS)	Controls which rows users can see based on their identity	Minimal if implemented in semantic model	Sales territories, department views
Object-Level Security (OLS)	Controls visibility of entire tables or columns	Can cause query errors in visuals	Hiding sensitive tables/columns
Column-Level Security (CLS)	Controls access to specific columns	Similar to OLS – can cause issues	Protecting sensitive columns like salaries

Best Practice: Implement RLS at the semantic model level with fixed identity for best performance. Avoid OLS/CLS when possible as they can cause report errors.

Capacity Requirements and Licensing

Direct Lake requires specific Fabric capacities and licenses:

Fabric Capacity License: Requires Fabric capacity SKUs (F-series). No support for Power BI Pro/PPU standalone
Memory Considerations: Larger SKUs provide more memory, supporting larger data volumes and more concurrent users
Performance Scaling: If capacity limits are exceeded, queries may slow down or fall back to DirectQuery
Region Requirements: Semantic model and lakehouse must be in the same Azure region

Current Limitations (As of Preview)

While powerful, Direct Lake has some limitations to be aware of:

Data Type Restrictions: No support for complex data types like binary or GUID
Composite Model Limitations: Limited support for combining Direct Lake with other storage modes
Desktop/Web Restrictions: Some Power BI Desktop and web modeling features not yet available
Geographic Constraints: Model and lakehouse must reside in the same Azure region
Preview Features: Some capabilities are still in preview and may change

Understanding and Optimizing Direct Lake Query Performance

Direct Lake performance isn’t just about the semantic model design – it’s deeply connected to how your Delta tables are structured and maintained in OneLake. Let’s dive into the factors that make or break Direct Lake performance.

Core Principle: Well-optimized Delta tables = Faster query execution. Your semantic model is only as fast as the data it reads from OneLake.

Fundamental Building Blocks

To understand performance optimization, you need to grasp these key concepts:

Delta Tables in OneLake: Store data in Parquet files with a transaction log (Delta log) for ACID compliance
Parquet Files: Columnar storage format where data is organized in row groups and column chunks
V-Order Optimization: Microsoft’s proprietary optimization that rearranges data in Parquet files for better compression and faster queries
VertiPaq & VertiScan: The storage engine that loads and processes data in memory
Dictionary Encoding: Stores unique values as integers to save space and accelerate queries

Key Factors Affecting Direct Lake Performance

1. V-Order Compression

V-Order is Microsoft’s secret sauce for Parquet optimization. It provides:

Better compression ratios (smaller file sizes)
Faster data loading into memory
Ability to run queries directly on compressed data

Implementation: Enable V-Order when writing Delta tables: spark.conf.set("spark.sql.parquet.vorder.enabled", "true")

2. Data Type Optimization

Choose data types wisely:

Use integers instead of strings where possible (faster processing, better compression)
Avoid storing numbers as text (increases storage, slows queries)
Use appropriate date/time types rather than strings

3. Segment Size Management

VertiPaq divides data into segments for processing. Optimal configuration:

Ideal Range: 1M – 16M rows per segment
Avoid: Too many small segments (slows queries)
Also Avoid: Extremely large segments (memory pressure)

4. Column Cardinality Considerations

Cardinality (number of unique values) significantly impacts performance:

High Cardinality: Too many unique values slows performance
Manageable Values: Keep unique values reasonable (avoid IDs with millions of unique entries as dimension keys)
Low-Cardinality Benefits: Excellent dictionary encoding efficiency

5. Memory Residency States

Direct Lake data moves through different memory states:

State	Description	Performance	Typical Scenario
Cold	No data in memory, must load from Parquet	Slowest	First query after restart
Semiwarm	Only updated parts need reloading	Moderate	After incremental updates
Warm	Data already in memory	Fast	Recent queries
Hot	Data in memory + query cache	Fastest	Frequently queried data

Bootstrapping: The Cold Start Process

When a Direct Lake model first loads (cold state), it goes through bootstrapping:

Dictionary Merging: Combines Parquet dictionaries into a global VertiPaq dictionary
Column Loading: Transcodes column chunks into column segments
Index Building: Creates join indexes for table relationships
Deletion Handling: Applies deletion vectors to ignore deleted rows

Performance Note: Queries in cold state are slower but improve dramatically as the model warms up. Consider “warming” critical models with initial queries.

Delta Table Best Practices for Direct Lake

Following these guidelines will ensure optimal Direct Lake performance:

✅ What You SHOULD Do

Use optimal row groups: Target 1M – 16M rows per row group
Keep Parquet file count reasonable: Fewer larger files (but balanced for parallel reads)
Regular Spark Optimize: Run regularly to merge small files
Low-cardinality partitioning: Use columns like Month, Region, Department (under 200 unique values)
Append-only updates: Perfect for incremental framing
Enable V-Order: Always use V-Order compression for Fabric workloads

❌ What You SHOULD Avoid

Too many small files: Creates excessive column segments, slowing queries
Overwrite updates: Resets Delta log, forces full reload
High-cardinality partitioning: Daily partitions with 1000+ values cause fragmentation
Using views: Causes DirectQuery fallback (slower performance)
Complex data types: Binary, GUID, and other unsupported types

Update Patterns and Their Performance Impact

1. Batch Updates Without Partitioning

Performance: Poor
Why: Deleting small chunks affects most files, causing widespread rewrites

2. Batch Updates With Partitioning

Performance: Good
Why: Deletes only touch partitioned files, minimizing impact
Tip: Keep partition columns under 200 unique values

3. Incremental Loading (Append-Only)

Performance: Excellent
Why: Perfect alignment with incremental framing
Best For: Fact tables, transaction data, logs

4. Stream Processing (Real-Time)

Performance: Variable
Challenge: Can create too many small files
Solution: Regular optimization (Spark Optimize) to merge files

Essential Maintenance Operations

Regular maintenance keeps your Direct Lake environment performing optimally:

1. Vacuuming

Cleans up old Parquet files no longer referenced by the Delta log.
Important: Ensure commit versions still exist until model refresh completes.

2. Spark Optimize

Merges small Parquet files into larger, more efficient files.
Schedule: Run regularly (e.g., weekends or during low-usage periods)
Benefit: Improves query speed and reduces metadata overhead

3. Delta Analyzer Tool

Use to monitor and analyze:
– Row group sizes
– File counts
– Update history
– Partition effectiveness

Ideal Direct Lake Setup Summary

Row Group Size: 1M – 16M rows
Partitioning: Low cardinality columns (<200 unique values)
Updates: Append-only pattern
Optimization: Regular Spark Optimize runs
Memory: Keep models in warm/hot state for frequent queries
Monitoring: Regular performance checks

Remember the Performance Formula:
Direct Lake Performance = Good Semantic Model + Efficient Delta Tables + Proper Updates + Memory Residency

Monitoring and Troubleshooting Performance

Tools for Performance Analysis

Tool	Purpose	Best For
Performance Analyzer (Power BI Desktop)	Quick checks of query execution mode	Identifying DirectQuery fallback
SQL Server Profiler (SSMS)	Detailed trace of query events	Deep performance analysis
DAX Studio (Community Tool)	Advanced query performance analysis	Query optimization
Azure Log Analytics	Collect and analyze telemetry	Enterprise monitoring
Delta Analyzer	Delta table health checks	Table optimization

Analyzing Query Processing

To check if queries run in Direct Lake or fall back to DirectQuery:

Method 1: Performance Analyzer (Quick Check)

Open Power BI Desktop → New Report
Get Data → Power BI semantic models
Select your Direct Lake model
Add a visual to the report
View → Performance Analyzer → Start recording
Refresh visual and expand results

Interpretation: If you see DirectQuery activity, fallback is occurring. Pure Direct Lake shows only VertiPaq events.

Method 2: SQL Server Profiler (Detailed Analysis)

Open SQL Server Profiler (included with SSMS)
File → New Trace → Connect to Analysis Services
Server name: Your Power BI workspace URL
Trace Properties → Events Selection → Show all events
Select Query Processing events:
- DirectQuery_Begin / DirectQuery_End
- VertiPaq_SE_Query_Begin / VertiPaq_SE_Query_End
- VertiPaq_SE_Query_Cache_Match / Cache_Miss
Run trace and interact with Power BI report

Interpretation: VertiPaq events indicate Direct Lake processing. DirectQuery events indicate fallback to slower mode.

Common Performance Issues and Solutions

Issue: DirectQuery Fallback

Causes:

Using SQL views instead of tables
RLS implemented at SQL endpoint level
Memory pressure exceeding limits
Unsupported data types or functions

Solutions:

Replace views with materialized tables
Implement RLS at semantic model level
Monitor and increase capacity if needed
Check for unsupported features

Issue: Slow Cold Queries

Causes: First query after model load or refresh
Solutions:

Implement cache warming (run initial queries proactively)
Schedule refreshes during off-hours
Keep frequently used models in warm state

Issue: Memory Pressure

Causes: Too much data loaded into memory
Solutions:

Optimize column selection (avoid SELECT *)
Implement aggregation tables
Upgrade to larger capacity SKU
Use DirectLakeOnly mode to prevent fallback to memory-intensive operations

Key Takeaway: Direct Lake delivers Import mode performance with DirectQuery freshness, but requires proper Delta table optimization and monitoring to achieve its full potential.

Working with Direct Lake in Power BI Desktop

Power BI Desktop offers powerful capabilities for working with Direct Lake semantic models, with some unique considerations compared to traditional Import or DirectQuery models.

What Makes Desktop Different for Direct Lake?

When working with Direct Lake tables in Power BI Desktop, you’re engaging in remote modeling:

No Local Engine: The local Power BI engine cannot run Direct Lake models
Fabric Connection Required: Desktop connects to your Fabric workspace for all modeling operations
Live Editing: Changes apply immediately to the cloud semantic model
Local Metadata: You can save a local copy (metadata + reports) in PBIP files

Power BI Project (PBIP) vs. Traditional PBIX

Aspect	Import Tables (PBIX)	Direct Lake Tables (PBIP)
Data Storage	Downloaded to your local PC	Stays in OneLake (cloud)
Performance	Depends on local PC specs	High (optimized for OneLake)
File Contents	Contains data + metadata	Metadata only (no data)
File Size	Can be very large	Small (KB range)
Version Control	Challenging (binary format)	Git-friendly (text-based)

Creating and Editing Direct Lake Semantic Models

Creating a New Semantic Model

Open Power BI Desktop
Select OneLake catalog from Get Data options
Choose a Fabric item (Lakehouse or Warehouse)
Connect and authenticate
Name your semantic model and select workspace
Choose tables to include → Click OK

Note: The model is created directly in your Fabric workspace, not on your local machine.

Adding Additional Tables

To add more tables to an existing Direct Lake semantic model:

While in live editing mode, open OneLake catalog again
Pick another Fabric item (Lakehouse or Warehouse)
Select the additional tables needed
Click OK to add them to your semantic model

Editing Existing Semantic Models

Three methods to edit Direct Lake semantic models:

Method 1: From Power BI Desktop

Open Power BI Desktop → OneLake catalog
Select your semantic model
Click Connect → Edit
You’re now in live editing mode

Method 2: From PBIP File

Open a previously exported PBIP file
Power BI Desktop connects to the remote semantic model
Begin editing with live connection

Method 3: From Fabric Portal

Navigate to your semantic model in Fabric
Select “Edit in Desktop” from context menu
Desktop opens with live connection

Key Differences in Live Editing Experience

Working with Direct Lake models in Desktop has some unique characteristics:

Report View: Removed unless using PBIP files
Table View: Only available for calculated tables
Auto-Save: Changes save directly to Fabric (no Save button)
Collaboration: Multiple editors supported with automatic sync, but conflicts may require rework
No Local Publishing: Cannot publish directly from Desktop → Use Git integration or APIs

Refresh Behavior in Desktop

Understanding refresh is crucial for Direct Lake management:

Refresh Button: Performs schema refresh (checks for column/table changes)
Scheduled Refresh: In Fabric, only re-frames tables (updates metadata)
Schema Changes: Source changes (column renames) may drop columns unless updated via TMDL
Automatic Updates: Enabled by default to sync with Delta table changes

Power BI Project (PBIP) Deep Dive

What is PBIP?

Power BI Project files (PBIP) are the recommended way to work with Direct Lake semantic models:

Local Storage: Stores metadata and reports locally
Remote Connection: Connects to Fabric workspace for live editing
Git Integration: Perfect for version control with text-based files
Deployment Pipelines: Enables safe movement from dev → test → prod

Exporting a Power BI Project

In Power BI Desktop: File > Export > Power BI Project
Choose save location (default: %USERPROFILE%\Microsoft Fabric\repos\[Workspace Name])
Export includes:
- Semantic model files (.pbi)
- Report definition files
- Configuration files
Open new Desktop instance → Open exported PBIP to continue editing

Editing and Saving in PBIP

Local Editing: Edit reports locally within PBIP structure
Save Button: Saves metadata + reports to PBIP folder
Publish Disabled: Cannot publish directly from Desktop
Deployment: Use Fabric Git Integration for publishing

Advanced Views and Capabilities

TMDL View

Allows direct editing of the semantic model script in a text-based format. Essential for:

Bulk model changes
Migration between data sources
Advanced customization

DAX Query View

Run and test DAX calculations directly against your semantic model. Both views are only saved when working with PBIP files.

Migrating Between Direct Lake Sources

To migrate from Direct Lake on SQL to Direct Lake on OneLake:

Use TMDL view to update connection string
Change from SQL endpoint → OneLake connection
Note: OneLake supports multiple data sources (unlike SQL)
Always backup before migration

Requirements and Limitations

Essential Requirements

XMLA Endpoint: Must be enabled in workspace
Write Permission: On semantic model
Viewer Permission: On Lakehouse/Warehouse
Fabric License: Not available for free license users
Same Region: Model and lakehouse must be in same Azure region

Current Limitations (Preview)

No Direct Publishing: Cannot publish PBIP from Desktop → must use Fabric Git or APIs
RLS Validation: Cannot validate Row-Level Security roles in Desktop
No Sign-Out: Cannot sign out during live editing sessions
Feature Gaps: Some features like barcode filters and external sharing not yet supported
Preview Status: Features may change before general availability

Direct Lake Web Modeling (Preview)

Microsoft Fabric now enables building and editing semantic models directly in your browser through web modeling.

Two Paths to Web Modeling

🚤 Direct Lake on OneLake

Creation Path:
Create → OneLake catalog → New semantic model from Lakehouse

Best For:
Lakehouse-centric architectures

🏢 Direct Lake on SQL

Creation Path:
SQL analytics endpoint → Reporting → New semantic model

Best For:
Warehouse-centric architectures

Key Point: Both paths allow editing in the browser AND switching to Power BI Desktop for deeper work.

Creating Semantic Models in the Web

For OneLake Models:

Go to Create → OneLake catalog → Fabric item
Or: Open Lakehouse → New semantic model
Enter model name
Choose Fabric workspace
Select tables → Click OK
Model opens in web modeling editor

For SQL Models:

Open SQL analytics endpoint or Warehouse
Go to Reporting → New semantic model
Enter model name
Choose Fabric workspace
Select tables → Click OK
Model opens in web modeling editor

Troubleshooting: If editor doesn’t open, check your browser’s pop-up blocker settings.

Editing Existing Models in Web

Go to Fabric Portal
Find your model via Home, OneLake catalog, or search
Select model → Open details page or context menu
Choose “Open data model”
Switch from view mode to editing mode (top-right)
Optional: Select “Edit in Desktop” for Power BI Desktop

Editing Tables for Direct Lake Semantic Models

The “Edit tables” dialog is your control center for managing which Direct Lake tables your semantic model includes.

Purpose of Edit Tables

Add/Remove Tables: Include or exclude tables from your model
Source Requirement: Tables must come from one Fabric item (Lakehouse or Warehouse)
Schema Sync: Apply column changes from data source

Edit Tables Dialog Components

Title: Indicates create vs edit mode
Info Links: Documentation and help
Workspace Links: View source Fabric item
Search Box: Find tables/views quickly
Filters: Narrow by schema or object type
Reload: Refresh schema (needs write permission)
Tree View: Schema → Type → Table/View names
Checkboxes: Select/unselect tables
Confirm/Cancel: Apply or discard changes

Important Considerations

Name Changes: You can rename tables/columns in the semantic model, but Edit tables shows original source names
Source Renames: If upstream renames a table/column, the model looks for old name → table disappears after sync
Renamed Tables: Show as unchecked in Edit tables → must re-add them
Preservation: Measures can be moved, but relationships and column properties must be reapplied

Opening Edit Tables

Web Modeling (Browser):

Ribbon button → Edit tables
Right-click table → Edit tables
If disabled: Try selecting only one table

Power BI Desktop:

Right-click table → Edit tables

Relationships in Direct Lake Mode

Creating and managing relationships in Direct Lake has some unique characteristics:

No Data Preview: Unlike Import/DirectQuery, no automatic data validation
Cardinality Guessing: Based on row counts (bigger table = “many” side)
Cross-Filter Direction: Always single by default (can change manually)
Validation Methods: Use DAX queries in DAX view or create visuals using both tables

Creating Relationships

Drag & Drop: Column → column in Model view
Manage Relationships: Ribbon button
New Relationship: Data pane → Model Explorer

Editing Relationships

Select relationship line in diagram → edit in Properties pane
Double-click line → open relationship editor

Limitations to Note

❌ No data preview in relationship dialog
❌ No auto validation of cardinality
❌ No auto validation of cross-filter direction
✅ Full control over syncing schemas and renaming
✅ Ability to create relationships manually

Summary: Edit tables in Direct Lake gives you complete control over which tables/views are in your semantic model, with full schema sync and relationship management capabilities—just without the automatic previews and validations of Import/DirectQuery.

Creating a Lakehouse for Direct Lake

Let’s walk through the complete process of setting up a Lakehouse, adding Delta tables, and building a Direct Lake semantic model in Microsoft Fabric.

Prerequisites: Before starting, ensure you understand Direct Lake basics and have appropriate Fabric permissions. Direct Lake works best with Delta tables using V-Order compression.

Step 1: Create a Lakehouse

In your Microsoft Fabric workspace, navigate to: New > More options > Data Engineering > Lakehouse
In the New lakehouse dialog:
- Enter a name (letters, numbers, underscores only)
- Click Create
Verify the lakehouse is created and opens successfully

Step 2: Create Delta Tables with V-Order Compression

Direct Lake requires Delta tables. Here’s how to create optimized tables:

Best Practice: Always use V-ORDER compression for faster Direct Lake performance.

In your lakehouse: Select Open notebook > New notebook
Copy and run this code to access Azure storage:

# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r"""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
    'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
    blob_sas_token
)
print('Remote blob path: ' + wasbs_path)

Expected: Output showing remote blob path

Read Parquet file into DataFrame:

df = spark.read.parquet(wasbs_path)
print(df.printSchema())

Expected: Schema of the data displayed

Save as Delta table with V-ORDER compression:

# Save as delta table with V-ORDER compression
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
df.write.format("delta").saveAsTable("holidays")

Verification: Wait for SPARK jobs to complete, then refresh the Tables node to confirm your table exists.

Step 3: Create Your Direct Lake Semantic Model

In your lakehouse: Select New semantic model
In the dialog:
- Choose which tables to include
- Select Confirm
A Direct Lake model is automatically created (same name as lakehouse)
Open the data model in Power BI Web modeling:
- Add relationships between tables
- Add DAX measures for calculations

Next Steps After Creation

Build Reports: Create Power BI reports on your semantic model
Composite Models: Combine with other data sources if needed
XMLA Endpoints: Enable for programmatic access
Security: Implement RLS/OLS as required

Key Takeaways:
• Lakehouse = Central storage for structured/unstructured data in Fabric
• Delta tables = Required for Direct Lake, optimize with V-ORDER compression
• Semantic model = Makes lakehouse data usable in Power BI
• Direct Lake = Fast in-memory queries without data duplication

Developing Direct Lake Semantic Models

What is a Direct Lake Semantic Model?

A semantic model is Power BI’s data modeling layer that defines:

Tables and their relationships
Measures and calculations
Security rules (RLS/OLS)
Business logic and semantics

Direct Lake mode makes this layer high-performance by querying large data directly from OneLake without importing everything.

Creation Methods

You can create Direct Lake semantic models through multiple interfaces:

Method	Best For	Notes
Power BI Desktop	Power users, developers	Full modeling capabilities
Browser (Web Modeling)	Quick edits, administrators	Direct from Lakehouse/Warehouse
SSMS (19.1+)	Database professionals	Via XMLA endpoint
Fabric Notebooks	Programmatic creation	Automation, CI/CD
Community Tools	Advanced scenarios	Tabular Editor, DAX Studio

Model Tables Best Practices

Critical: Avoid using views in Direct Lake models—they cause queries to fall back to DirectQuery mode (slower performance).

Optimal table structure includes:

Essential Columns: For filtering, grouping, sorting, summarizing
Relationship Columns: Support table relationships
Avoid Unnecessary Columns: Don’t slow queries but increase storage
No Dynamic Data Masking (DDM): Not supported in Direct Lake

Supported Table Types

Direct Lake tables (from SQL analytics endpoint)
Import tables (stored on OneLake)
Calculated tables (but NOT referencing Direct Lake tables)
Calculation groups

Enforcing Data-Access Rules

Different users often need different data views. Direct Lake supports multiple security approaches:

1. Object-Level Security (OLS)

Purpose: Control visibility of tables/columns
Example: Hide “Salary” column from non-HR users
Setup: SQL analytics endpoint or semantic model (via Tabular Editor)
Caution: Can cause report errors in visuals

2. Row-Level Security (RLS)

Purpose: Control which rows users can see
Example: Salespeople only see their region’s sales
Setup Options:

SQL analytics endpoint (causes DirectQuery fallback)
Semantic model (web modeling or third-party tools)

Performance: Best in semantic model with fixed identity

How Queries Are Evaluated

Direct Lake follows this decision path for each query:

If query hits restricted OLS column/table → Error
If restricted by SQL endpoint CLS → Error
If query uses a view or table with RLS at endpoint → DirectQuery fallback
If query exceeds memory limits → DirectQuery fallback
Otherwise → In-memory cache (fastest)

Source Item Permissions

Access depends on connection type:

Connection Type	Account Used	Permissions Needed
SSO (Default)	Report consumer’s identity	Read + ReadData on source item
Fixed Identity	Shared service account	Single account needs permissions

Security Implementation Options

Where Rules Are Applied	Performance Impact	Best For
Semantic model only	Best (in-memory)	Most scenarios; use fixed identity
SQL endpoint only	Slower (DirectQuery fallback)	When users also access warehouse directly
Both layers	Complex, variable	Advanced security requirements

Best Practice: Implement RLS in the semantic model with fixed identity for optimal performance. Avoid OLS/CLS when possible as they can cause visualization errors.

XMLA Endpoint for Advanced Management

Direct Lake semantic models support XMLA write operations, enabling:

Customization & Scripting: Via SSMS, PowerShell, or custom tools
CI/CD Integration: With GitHub or Azure DevOps
Automation: Refresh schedules, model updates
Debugging: Advanced troubleshooting capabilities

Important: After creating new tables via XMLA, you must refresh the model. Otherwise, queries will fall back to DirectQuery.

Direct Lake Model Metadata

Compatibility Level: 1604
Partition Mode: directLake
Schema Source: SQL analytics endpoint
Data Source: OneLake (direct loading)

Managing Direct Lake Semantic Models

Post-Publication Checklist

After publishing a Direct Lake semantic model, complete these essential tasks:

Cloud Connection Setup: Configure secure data connectivity
Security Role Management: Define RLS membership
Fabric Item Permissions: Assign Read/Build access
Refresh Configuration: Set up data updates
Data Discovery: Enable endorsement for visibility

Cloud Connection Types

🔐 Default Cloud Connection

Authentication: SSO (user’s identity)
Best For: When all users have direct data access
Permissions: Users need Read/ReadData on source

🏢 Sharable Cloud Connection

Authentication: Fixed identity (service principal)
Best For: Centralized security management
Permissions: Single account needs access

Authentication Methods

OAuth 2.0: User account-based (not recommended for production)
Service Principal: Recommended for enterprise:
- Secure, not tied to individual users
- Supports secret/certificate rotation
- Centralized management

Permission Requirements

User Scenario	Permission Needed	Notes
View reports only	Read (report + semantic model)	Basic consumption
Create new reports	Build (semantic model)	Self-service BI
Manage model (refresh, settings)	Owner	Administration
Query with fixed identity	No lakehouse permission needed	Service account has access
Query with SSO	Read/ReadData on lakehouse	User needs direct access

Testing Tip: Always test permissions thoroughly before deploying to production users.

Refresh Strategies for Direct Lake

Refreshing updates your model with new data from source Delta tables:

Refresh Methods

Manual Refresh: Via Fabric portal or SSMS scripts
Scheduled Refresh: Automatic at defined intervals
Programmatic Refresh: Triggered via API or ETL pipelines
Automatic Updates: Enabled by default; detects Delta table changes

Pro Tip: Disable automatic updates if you need ETL jobs to complete before exposing new data to users.

Cache Management

After refresh, memory cache may be cleared → initial queries can be slow.

Cache Warming Strategy: Run critical queries immediately after refresh to load important data into memory. Only needed if query delays are unacceptable for users.

Direct Lake Behavior Property

Control how queries are processed:

Setting	Behavior	Use Case
Automatic (Default)	Falls back to DirectQuery if needed	General use, ensures queries work
DirectLakeOnly	Strict in-memory only; no fallback	Performance-critical scenarios
DirectQueryOnly	Always uses DirectQuery	Testing, troubleshooting

Recommendation: Use DirectLakeOnly for production if you want guaranteed in-memory performance and have addressed all potential fallback causes.

Specifying a Fixed Identity for Direct Lake

By default, Direct Lake uses SSO (user identity). For centralized security, you can configure a fixed identity:

Step-by-Step Configuration

Access Model Settings:
- Open your Direct Lake model settings
- Expand “Gateway and cloud connections”
- Note the SQL Server data source pointing to your lakehouse/warehouse
Create New Connection:
- In “Maps to” listbox, click “Create a connection”
- New connection pane opens with pre-filled details
- Enter meaningful name (e.g., “FixedIdentityConnection”)
Set Authentication Method:
- Choose: OAuth 2.0 (user account) or Service Principal (recommended)
- Enter credentials for the fixed identity
Disable SSO:
- Go to Single sign-on section
- Ensure “SSO via Microsoft Entra ID for DirectQuery queries” is NOT selected
Complete Setup:
- Configure any additional parameters
- Click “Create”
Verify Configuration:
- Confirm data source now maps to non-SSO connection
- Test model connectivity

Quick Recap: Model Settings → Gateway and cloud connections → Create connection → Choose authentication → Disable SSO → Create & Verify.

Building Reports with Direct Lake Semantic Models

Understanding the Semantic Model Role

Think of the semantic model as the “brain” of your Power BI reports:

Organizes data for reporting, exploration, and DAX queries
With Direct Lake, reads directly from OneLake Delta tables
Maintains cache for recently used data
Automatically falls back to DirectQuery if needed

Creating Reports in Power BI Desktop

Open Power BI Desktop
Select: OneLake catalog or Get data → Power BI semantic models
Choose your Direct Lake semantic model → Connect
You’re now live-connected → start building visuals
Save locally → Publish to workspace to share

Note: You can create additional measures in your report without modifying the underlying semantic model.

Creating Reports in Power BI Service / Fabric Portal

Multiple browser-based options:

Semantic model menu (…) → Create report
Home → New report → Pick published semantic model
OneLake catalog → Data & Semantic model → Create report
Semantic model details page → Create blank report
Web modeling (File → Create new report)

Additional Reporting Options

🔍 Explore

Quick, interactive analysis

📄 Paginated Reports

Pixel-perfect, printable reports

📊 DAX Queries

Advanced calculations

📈 Excel + Power BI

Refreshable PivotTables

Consumption Scenarios and Permissions

Scenario 1: View Report Only

Approach: Use Fixed Identity
Options:

Publish as App (report-only permissions)
Give Viewer role on report + semantic model
Separate workspace with Viewer role

Scenario 2: View + Create Own Reports

Approach: Use Fixed Identity
Permissions: Build permission on semantic model + Viewer on report

Scenario 3: View + Create + SQL Endpoint + Delta Access

Approach: Use SSO
Permissions Required:

Build access on semantic model
ReadAll permission on Fabric item
Direct lakehouse/warehouse access

Scenario 4: View + Edit Everything

Approach: Use SSO
Permissions: Higher workspace roles (Contributor, Member, Admin)

Analogy:
• Semantic model = Kitchen (where food is prepared)
• Reports = Meals served (visuals you consume)
• Permissions = Access rules (who can eat, cook, or change recipes)

Final Key Takeaways

Direct Lake: The Future of Power BI at Scale

✅ Performance: Import-mode speed with DirectQuery freshness
✅ Scalability: Works with datasets larger than memory
✅ Efficiency: No full data copies, lightweight refreshes
✅ Freshness: Near real-time data synchronization
✅ Cost-Effective: Reduced storage and compute requirements
✅ Enterprise-Ready: Built for Fabric/OneLake ecosystems

Direct Lake represents a fundamental shift in how Power BI handles large datasets, combining the best aspects of traditional storage modes while eliminating their limitations. For organizations investing in Microsoft Fabric and OneLake, Direct Lake is the clear path forward for scalable, high-performance business intelligence.

Getting Started Recommendation: Begin with a pilot project to validate Direct Lake fits your use case. Focus on Delta table optimization, monitor for DirectQuery fallback, and gradually expand as you gain experience.

Latest

The Complete Guide to Direct Lake in Power BI

The Complete Guide to Direct Lake in Power BI

What is Direct Lake?

Key Characteristics

How Direct Lake Works: The Technical Breakdown

Core Architecture

Direct Lake vs. Import vs. DirectQuery: A Detailed Comparison

When to Choose Each Mode

Key Benefits of Direct Lake

⚡ Blazing Fast Queries

🔄 No More Big Refreshes

📊 Scales to Massive Datasets

🕐 Always Fresh Data

🏢 Enterprise-Ready Architecture

💰 Cost Effective

Essential Direct Lake Concepts

1. Column Loading (Transcoding)

2. Framing Refresh

3. Automatic Updates

4. DirectQuery Fallback

Security in Direct Lake

Authentication Methods

Security Layers

Capacity Requirements and Licensing

Current Limitations (As of Preview)

Understanding and Optimizing Direct Lake Query Performance

Fundamental Building Blocks

Key Factors Affecting Direct Lake Performance

1. V-Order Compression

2. Data Type Optimization

3. Segment Size Management

4. Column Cardinality Considerations

5. Memory Residency States

Bootstrapping: The Cold Start Process

Delta Table Best Practices for Direct Lake

✅ What You SHOULD Do

❌ What You SHOULD Avoid

Update Patterns and Their Performance Impact

1. Batch Updates Without Partitioning

2. Batch Updates With Partitioning

3. Incremental Loading (Append-Only)

4. Stream Processing (Real-Time)

Essential Maintenance Operations

1. Vacuuming

2. Spark Optimize

3. Delta Analyzer Tool

Ideal Direct Lake Setup Summary

Monitoring and Troubleshooting Performance

Tools for Performance Analysis

Analyzing Query Processing

Method 1: Performance Analyzer (Quick Check)

Method 2: SQL Server Profiler (Detailed Analysis)

Common Performance Issues and Solutions

Issue: DirectQuery Fallback

Issue: Slow Cold Queries

Issue: Memory Pressure

Working with Direct Lake in Power BI Desktop

What Makes Desktop Different for Direct Lake?

Power BI Project (PBIP) vs. Traditional PBIX

Creating and Editing Direct Lake Semantic Models

Creating a New Semantic Model

Adding Additional Tables

Editing Existing Semantic Models

Method 1: From Power BI Desktop

Method 2: From PBIP File

Method 3: From Fabric Portal

Key Differences in Live Editing Experience

Refresh Behavior in Desktop

Power BI Project (PBIP) Deep Dive

What is PBIP?

Exporting a Power BI Project

Editing and Saving in PBIP

Advanced Views and Capabilities

TMDL View

DAX Query View

Migrating Between Direct Lake Sources

Requirements and Limitations

Essential Requirements

Current Limitations (Preview)