The Complete Guide to Direct Lake in Power BI

The Complete Guide to Direct Lake in Power BI

Quick Summary: Direct Lake is Microsoft Fabric’s innovative storage mode that combines the performance of Import mode with the data freshness of DirectQuery, enabling fast interactive analysis on massive datasets stored in OneLake without full data imports.

What is Direct Lake?

Direct Lake is a revolutionary storage mode in Power BI that fundamentally changes how large datasets are processed and analyzed. Instead of importing data into Power BI (which creates duplicates and storage overhead) or querying live databases directly (which can be slow), Direct Lake connects directly to data stored in OneLake – Microsoft Fabric’s unified data lake.

Think of it as having the best of both worlds: Import mode performance with DirectQuery freshness, all while eliminating the need for massive data refreshes that consume time and resources.

Key Characteristics

  • No Full Data Copy: Data remains in OneLake as optimized Delta tables (Parquet files)
  • On-Demand Loading: Only the specific columns needed for queries are loaded into memory
  • Built for Scale: Works with datasets larger than available memory
  • Automatic Updates: Syncs with source data changes through lightweight “framing”
  • Enterprise-Ready: Designed for IT-driven projects with big data lakes and warehouses

How Direct Lake Works: The Technical Breakdown

Understanding the mechanics behind Direct Lake helps appreciate why it’s such a game-changer for Power BI performance at scale.

Core Architecture

Direct Lake operates on a simple but powerful principle: query data where it lives. Here’s how it works:

  1. Data Storage: All data resides in OneLake as Delta tables (optimized Parquet files with transaction logs)
  2. Query Processing: The VertiPaq engine (same as Import mode) processes queries
  3. Selective Loading: Only the specific columns required for a query are loaded into memory
  4. Data Preparation: All ETL and data transformation happens in OneLake using Spark, T-SQL, or pipelines
  5. Automatic Synchronization: Changes in source data are automatically detected and reflected

Pro Tip: The “framing” process in Direct Lake only updates metadata about what data is available, not the actual data itself. This makes refreshes incredibly fast compared to traditional Import mode.

Direct Lake vs. Import vs. DirectQuery: A Detailed Comparison

Choosing the right storage mode is crucial for Power BI performance. Here’s how Direct Lake stacks up against traditional options:

FeatureImport ModeDirectQueryDirect Lake
Data CopyFull copy into Power BINo copy, queries source directlyNo full copy, loads needed data only
PerformanceFast (in-memory processing)Slower (depends on source database)Fast (Import-like performance)
Refresh RequirementsHeavy & time-consumingNo refresh neededLight “framing” refresh
Data LatencyCan be outdated until refreshAlways latest dataNear real-time (auto updates)
Memory UsageHigh (entire dataset)Low (only query results)Moderate (only needed columns)
Best ForSmall to medium datasetsReal-time reporting on live dataLarge datasets in data lakes

When to Choose Each Mode

Import Mode: Still excellent for small to medium datasets, self-service analytics, and scenarios where you need Power Query transformations within Power BI.

DirectQuery: Essential when you need absolutely real-time data from operational databases or when data volumes exceed what can reasonably be imported.

Direct Lake: The ideal choice for Fabric/OneLake environments with large datasets where you need both performance and data freshness without the overhead of full refreshes.

Key Benefits of Direct Lake

Direct Lake isn’t just another storage mode – it’s a paradigm shift for enterprise Power BI deployments. Here’s why:

⚑ Blazing Fast Queries

Experience Import mode-like performance while querying data directly from OneLake. The VertiPaq engine processes queries in-memory for lightning-fast responses.

πŸ”„ No More Big Refreshes

Say goodbye to hour-long refresh cycles. Direct Lake uses lightweight “framing” that updates only metadata, making refreshes quick and cost-effective.

πŸ“Š Scales to Massive Datasets

Work with datasets larger than available memory. Direct Lake loads only the columns needed for each query, enabling analysis on terabytes of data.

πŸ• Always Fresh Data

Automatic synchronization with OneLake means your reports always reflect the latest data without manual intervention.

🏒 Enterprise-Ready Architecture

Built for IT-driven projects with centralized data governance, security, and management in Fabric/OneLake environments.

πŸ’° Cost Effective

Reduce storage costs by eliminating data duplication and minimize compute costs with efficient query processing.

Essential Direct Lake Concepts

To effectively work with Direct Lake, you need to understand these fundamental concepts:

1. Column Loading (Transcoding)

Direct Lake doesn’t load entire tables into memory. Instead, it uses transcoding to load only the specific columns needed for a query from Parquet files into the VertiPaq engine. This selective loading is key to its memory efficiency.

2. Framing Refresh

Unlike traditional refreshes that move all data, framing only updates metadata about what data is available in the Delta tables. It’s like updating the table of contents without rewriting the entire book.

3. Automatic Updates

Direct Lake models automatically stay synchronized with changes in OneLake. When data is added, modified, or deleted in the source Delta tables, the semantic model detects these changes and updates accordingly.

4. DirectQuery Fallback

In certain scenarios (like when using SQL views or encountering security rules), Direct Lake may fall back to DirectQuery mode. While this ensures queries still work, performance will be slower, so it’s best to avoid fallback when possible.

Important: DirectQuery fallback can significantly impact performance. Always monitor your queries to ensure they’re running in true Direct Lake mode for optimal speed.

Security in Direct Lake

Security is paramount in enterprise deployments. Direct Lake offers flexible security options:

Authentication Methods

  • Single Sign-On (SSO): Default method where users access data with their own identities. Users need appropriate permissions on the underlying Fabric items.
  • Fixed Identity: Uses a shared service account for all users. Ideal when you want to centralize security management or when users shouldn’t directly access source data.

Security Layers

Direct Lake supports multiple security approaches depending on your needs:

Security TypeDescriptionPerformance ImpactBest For
Row-Level Security (RLS)Controls which rows users can see based on their identityMinimal if implemented in semantic modelSales territories, department views
Object-Level Security (OLS)Controls visibility of entire tables or columnsCan cause query errors in visualsHiding sensitive tables/columns
Column-Level Security (CLS)Controls access to specific columnsSimilar to OLS – can cause issuesProtecting sensitive columns like salaries

Best Practice: Implement RLS at the semantic model level with fixed identity for best performance. Avoid OLS/CLS when possible as they can cause report errors.

Capacity Requirements and Licensing

Direct Lake requires specific Fabric capacities and licenses:

  • Fabric Capacity License: Requires Fabric capacity SKUs (F-series). No support for Power BI Pro/PPU standalone
  • Memory Considerations: Larger SKUs provide more memory, supporting larger data volumes and more concurrent users
  • Performance Scaling: If capacity limits are exceeded, queries may slow down or fall back to DirectQuery
  • Region Requirements: Semantic model and lakehouse must be in the same Azure region

Current Limitations (As of Preview)

While powerful, Direct Lake has some limitations to be aware of:

  • Data Type Restrictions: No support for complex data types like binary or GUID
  • Composite Model Limitations: Limited support for combining Direct Lake with other storage modes
  • Desktop/Web Restrictions: Some Power BI Desktop and web modeling features not yet available
  • Geographic Constraints: Model and lakehouse must reside in the same Azure region
  • Preview Features: Some capabilities are still in preview and may change

Understanding and Optimizing Direct Lake Query Performance

Direct Lake performance isn’t just about the semantic model design – it’s deeply connected to how your Delta tables are structured and maintained in OneLake. Let’s dive into the factors that make or break Direct Lake performance.

Core Principle: Well-optimized Delta tables = Faster query execution. Your semantic model is only as fast as the data it reads from OneLake.

Fundamental Building Blocks

To understand performance optimization, you need to grasp these key concepts:

  • Delta Tables in OneLake: Store data in Parquet files with a transaction log (Delta log) for ACID compliance
  • Parquet Files: Columnar storage format where data is organized in row groups and column chunks
  • V-Order Optimization: Microsoft’s proprietary optimization that rearranges data in Parquet files for better compression and faster queries
  • VertiPaq & VertiScan: The storage engine that loads and processes data in memory
  • Dictionary Encoding: Stores unique values as integers to save space and accelerate queries

Key Factors Affecting Direct Lake Performance

1. V-Order Compression

V-Order is Microsoft’s secret sauce for Parquet optimization. It provides:

  • Better compression ratios (smaller file sizes)
  • Faster data loading into memory
  • Ability to run queries directly on compressed data

Implementation: Enable V-Order when writing Delta tables: spark.conf.set("spark.sql.parquet.vorder.enabled", "true")

2. Data Type Optimization

Choose data types wisely:

  • Use integers instead of strings where possible (faster processing, better compression)
  • Avoid storing numbers as text (increases storage, slows queries)
  • Use appropriate date/time types rather than strings

3. Segment Size Management

VertiPaq divides data into segments for processing. Optimal configuration:

  • Ideal Range: 1M – 16M rows per segment
  • Avoid: Too many small segments (slows queries)
  • Also Avoid: Extremely large segments (memory pressure)

4. Column Cardinality Considerations

Cardinality (number of unique values) significantly impacts performance:

  • High Cardinality: Too many unique values slows performance
  • Manageable Values: Keep unique values reasonable (avoid IDs with millions of unique entries as dimension keys)
  • Low-Cardinality Benefits: Excellent dictionary encoding efficiency

5. Memory Residency States

Direct Lake data moves through different memory states:

StateDescriptionPerformanceTypical Scenario
ColdNo data in memory, must load from ParquetSlowestFirst query after restart
SemiwarmOnly updated parts need reloadingModerateAfter incremental updates
WarmData already in memoryFastRecent queries
HotData in memory + query cacheFastestFrequently queried data

Bootstrapping: The Cold Start Process

When a Direct Lake model first loads (cold state), it goes through bootstrapping:

  1. Dictionary Merging: Combines Parquet dictionaries into a global VertiPaq dictionary
  2. Column Loading: Transcodes column chunks into column segments
  3. Index Building: Creates join indexes for table relationships
  4. Deletion Handling: Applies deletion vectors to ignore deleted rows

Performance Note: Queries in cold state are slower but improve dramatically as the model warms up. Consider “warming” critical models with initial queries.

Delta Table Best Practices for Direct Lake

Following these guidelines will ensure optimal Direct Lake performance:

βœ… What You SHOULD Do

  • Use optimal row groups: Target 1M – 16M rows per row group
  • Keep Parquet file count reasonable: Fewer larger files (but balanced for parallel reads)
  • Regular Spark Optimize: Run regularly to merge small files
  • Low-cardinality partitioning: Use columns like Month, Region, Department (under 200 unique values)
  • Append-only updates: Perfect for incremental framing
  • Enable V-Order: Always use V-Order compression for Fabric workloads

❌ What You SHOULD Avoid

  • Too many small files: Creates excessive column segments, slowing queries
  • Overwrite updates: Resets Delta log, forces full reload
  • High-cardinality partitioning: Daily partitions with 1000+ values cause fragmentation
  • Using views: Causes DirectQuery fallback (slower performance)
  • Complex data types: Binary, GUID, and other unsupported types

Update Patterns and Their Performance Impact

1. Batch Updates Without Partitioning

Performance: Poor
Why: Deleting small chunks affects most files, causing widespread rewrites

2. Batch Updates With Partitioning

Performance: Good
Why: Deletes only touch partitioned files, minimizing impact
Tip: Keep partition columns under 200 unique values

3. Incremental Loading (Append-Only)

Performance: Excellent
Why: Perfect alignment with incremental framing
Best For: Fact tables, transaction data, logs

4. Stream Processing (Real-Time)

Performance: Variable
Challenge: Can create too many small files
Solution: Regular optimization (Spark Optimize) to merge files

Essential Maintenance Operations

Regular maintenance keeps your Direct Lake environment performing optimally:

1. Vacuuming

Cleans up old Parquet files no longer referenced by the Delta log.
Important: Ensure commit versions still exist until model refresh completes.

2. Spark Optimize

Merges small Parquet files into larger, more efficient files.
Schedule: Run regularly (e.g., weekends or during low-usage periods)
Benefit: Improves query speed and reduces metadata overhead

3. Delta Analyzer Tool

Use to monitor and analyze:
– Row group sizes
– File counts
– Update history
– Partition effectiveness

Ideal Direct Lake Setup Summary

  • Row Group Size: 1M – 16M rows
  • Partitioning: Low cardinality columns (<200 unique values)
  • Updates: Append-only pattern
  • Optimization: Regular Spark Optimize runs
  • Memory: Keep models in warm/hot state for frequent queries
  • Monitoring: Regular performance checks

Remember the Performance Formula:
Direct Lake Performance = Good Semantic Model + Efficient Delta Tables + Proper Updates + Memory Residency

Monitoring and Troubleshooting Performance

Tools for Performance Analysis

ToolPurposeBest For
Performance Analyzer
(Power BI Desktop)
Quick checks of query execution modeIdentifying DirectQuery fallback
SQL Server Profiler
(SSMS)
Detailed trace of query eventsDeep performance analysis
DAX Studio
(Community Tool)
Advanced query performance analysisQuery optimization
Azure Log AnalyticsCollect and analyze telemetryEnterprise monitoring
Delta AnalyzerDelta table health checksTable optimization

Analyzing Query Processing

To check if queries run in Direct Lake or fall back to DirectQuery:

Method 1: Performance Analyzer (Quick Check)

  1. Open Power BI Desktop β†’ New Report
  2. Get Data β†’ Power BI semantic models
  3. Select your Direct Lake model
  4. Add a visual to the report
  5. View β†’ Performance Analyzer β†’ Start recording
  6. Refresh visual and expand results

Interpretation: If you see DirectQuery activity, fallback is occurring. Pure Direct Lake shows only VertiPaq events.

Method 2: SQL Server Profiler (Detailed Analysis)

  1. Open SQL Server Profiler (included with SSMS)
  2. File β†’ New Trace β†’ Connect to Analysis Services
  3. Server name: Your Power BI workspace URL
  4. Trace Properties β†’ Events Selection β†’ Show all events
  5. Select Query Processing events:
    • DirectQuery_Begin / DirectQuery_End
    • VertiPaq_SE_Query_Begin / VertiPaq_SE_Query_End
    • VertiPaq_SE_Query_Cache_Match / Cache_Miss
  6. Run trace and interact with Power BI report

Interpretation: VertiPaq events indicate Direct Lake processing. DirectQuery events indicate fallback to slower mode.

Common Performance Issues and Solutions

Issue: DirectQuery Fallback

Causes:

  • Using SQL views instead of tables
  • RLS implemented at SQL endpoint level
  • Memory pressure exceeding limits
  • Unsupported data types or functions
Solutions:
  • Replace views with materialized tables
  • Implement RLS at semantic model level
  • Monitor and increase capacity if needed
  • Check for unsupported features

Issue: Slow Cold Queries

Causes: First query after model load or refresh
Solutions:

  • Implement cache warming (run initial queries proactively)
  • Schedule refreshes during off-hours
  • Keep frequently used models in warm state

Issue: Memory Pressure

Causes: Too much data loaded into memory
Solutions:

  • Optimize column selection (avoid SELECT *)
  • Implement aggregation tables
  • Upgrade to larger capacity SKU
  • Use DirectLakeOnly mode to prevent fallback to memory-intensive operations

Key Takeaway: Direct Lake delivers Import mode performance with DirectQuery freshness, but requires proper Delta table optimization and monitoring to achieve its full potential.

Working with Direct Lake in Power BI Desktop

Power BI Desktop offers powerful capabilities for working with Direct Lake semantic models, with some unique considerations compared to traditional Import or DirectQuery models.

What Makes Desktop Different for Direct Lake?

When working with Direct Lake tables in Power BI Desktop, you’re engaging in remote modeling:

  • No Local Engine: The local Power BI engine cannot run Direct Lake models
  • Fabric Connection Required: Desktop connects to your Fabric workspace for all modeling operations
  • Live Editing: Changes apply immediately to the cloud semantic model
  • Local Metadata: You can save a local copy (metadata + reports) in PBIP files

Power BI Project (PBIP) vs. Traditional PBIX

AspectImport Tables (PBIX)Direct Lake Tables (PBIP)
Data StorageDownloaded to your local PCStays in OneLake (cloud)
PerformanceDepends on local PC specsHigh (optimized for OneLake)
File ContentsContains data + metadataMetadata only (no data)
File SizeCan be very largeSmall (KB range)
Version ControlChallenging (binary format)Git-friendly (text-based)

Creating and Editing Direct Lake Semantic Models

Creating a New Semantic Model

  1. Open Power BI Desktop
  2. Select OneLake catalog from Get Data options
  3. Choose a Fabric item (Lakehouse or Warehouse)
  4. Connect and authenticate
  5. Name your semantic model and select workspace
  6. Choose tables to include β†’ Click OK

Note: The model is created directly in your Fabric workspace, not on your local machine.

Adding Additional Tables

To add more tables to an existing Direct Lake semantic model:

  1. While in live editing mode, open OneLake catalog again
  2. Pick another Fabric item (Lakehouse or Warehouse)
  3. Select the additional tables needed
  4. Click OK to add them to your semantic model

Editing Existing Semantic Models

Three methods to edit Direct Lake semantic models:

Method 1: From Power BI Desktop

  1. Open Power BI Desktop β†’ OneLake catalog
  2. Select your semantic model
  3. Click Connect β†’ Edit
  4. You’re now in live editing mode

Method 2: From PBIP File

  1. Open a previously exported PBIP file
  2. Power BI Desktop connects to the remote semantic model
  3. Begin editing with live connection

Method 3: From Fabric Portal

  1. Navigate to your semantic model in Fabric
  2. Select “Edit in Desktop” from context menu
  3. Desktop opens with live connection

Key Differences in Live Editing Experience

Working with Direct Lake models in Desktop has some unique characteristics:

  • Report View: Removed unless using PBIP files
  • Table View: Only available for calculated tables
  • Auto-Save: Changes save directly to Fabric (no Save button)
  • Collaboration: Multiple editors supported with automatic sync, but conflicts may require rework
  • No Local Publishing: Cannot publish directly from Desktop β†’ Use Git integration or APIs

Refresh Behavior in Desktop

Understanding refresh is crucial for Direct Lake management:

  • Refresh Button: Performs schema refresh (checks for column/table changes)
  • Scheduled Refresh: In Fabric, only re-frames tables (updates metadata)
  • Schema Changes: Source changes (column renames) may drop columns unless updated via TMDL
  • Automatic Updates: Enabled by default to sync with Delta table changes

Power BI Project (PBIP) Deep Dive

What is PBIP?

Power BI Project files (PBIP) are the recommended way to work with Direct Lake semantic models:

  • Local Storage: Stores metadata and reports locally
  • Remote Connection: Connects to Fabric workspace for live editing
  • Git Integration: Perfect for version control with text-based files
  • Deployment Pipelines: Enables safe movement from dev β†’ test β†’ prod

Exporting a Power BI Project

  1. In Power BI Desktop: File > Export > Power BI Project
  2. Choose save location (default: %USERPROFILE%\Microsoft Fabric\repos\[Workspace Name])
  3. Export includes:
    • Semantic model files (.pbi)
    • Report definition files
    • Configuration files
  4. Open new Desktop instance β†’ Open exported PBIP to continue editing

Editing and Saving in PBIP

  • Local Editing: Edit reports locally within PBIP structure
  • Save Button: Saves metadata + reports to PBIP folder
  • Publish Disabled: Cannot publish directly from Desktop
  • Deployment: Use Fabric Git Integration for publishing

Advanced Views and Capabilities

TMDL View

Allows direct editing of the semantic model script in a text-based format. Essential for:

  • Bulk model changes
  • Migration between data sources
  • Advanced customization

DAX Query View

Run and test DAX calculations directly against your semantic model. Both views are only saved when working with PBIP files.

Migrating Between Direct Lake Sources

To migrate from Direct Lake on SQL to Direct Lake on OneLake:

  1. Use TMDL view to update connection string
  2. Change from SQL endpoint β†’ OneLake connection
  3. Note: OneLake supports multiple data sources (unlike SQL)
  4. Always backup before migration

Requirements and Limitations

Essential Requirements

  • XMLA Endpoint: Must be enabled in workspace
  • Write Permission: On semantic model
  • Viewer Permission: On Lakehouse/Warehouse
  • Fabric License: Not available for free license users
  • Same Region: Model and lakehouse must be in same Azure region

Current Limitations (Preview)

  • No Direct Publishing: Cannot publish PBIP from Desktop β†’ must use Fabric Git or APIs
  • RLS Validation: Cannot validate Row-Level Security roles in Desktop
  • No Sign-Out: Cannot sign out during live editing sessions
  • Feature Gaps: Some features like barcode filters and external sharing not yet supported
  • Preview Status: Features may change before general availability

Direct Lake Web Modeling (Preview)

Microsoft Fabric now enables building and editing semantic models directly in your browser through web modeling.

Two Paths to Web Modeling

🚀 Direct Lake on OneLake

Creation Path:
Create β†’ OneLake catalog β†’ New semantic model from Lakehouse

Best For:
Lakehouse-centric architectures

🏒 Direct Lake on SQL

Creation Path:
SQL analytics endpoint β†’ Reporting β†’ New semantic model

Best For:
Warehouse-centric architectures

Key Point: Both paths allow editing in the browser AND switching to Power BI Desktop for deeper work.

Creating Semantic Models in the Web

For OneLake Models:

  1. Go to Create β†’ OneLake catalog β†’ Fabric item
  2. Or: Open Lakehouse β†’ New semantic model
  3. Enter model name
  4. Choose Fabric workspace
  5. Select tables β†’ Click OK
  6. Model opens in web modeling editor

For SQL Models:

  1. Open SQL analytics endpoint or Warehouse
  2. Go to Reporting β†’ New semantic model
  3. Enter model name
  4. Choose Fabric workspace
  5. Select tables β†’ Click OK
  6. Model opens in web modeling editor

Troubleshooting: If editor doesn’t open, check your browser’s pop-up blocker settings.

Editing Existing Models in Web

  1. Go to Fabric Portal
  2. Find your model via Home, OneLake catalog, or search
  3. Select model β†’ Open details page or context menu
  4. Choose “Open data model”
  5. Switch from view mode to editing mode (top-right)
  6. Optional: Select “Edit in Desktop” for Power BI Desktop

Editing Tables for Direct Lake Semantic Models

The “Edit tables” dialog is your control center for managing which Direct Lake tables your semantic model includes.

Purpose of Edit Tables

  • Add/Remove Tables: Include or exclude tables from your model
  • Source Requirement: Tables must come from one Fabric item (Lakehouse or Warehouse)
  • Schema Sync: Apply column changes from data source

Edit Tables Dialog Components

  • Title: Indicates create vs edit mode
  • Info Links: Documentation and help
  • Workspace Links: View source Fabric item
  • Search Box: Find tables/views quickly
  • Filters: Narrow by schema or object type
  • Reload: Refresh schema (needs write permission)
  • Tree View: Schema β†’ Type β†’ Table/View names
  • Checkboxes: Select/unselect tables
  • Confirm/Cancel: Apply or discard changes

Important Considerations

  • Name Changes: You can rename tables/columns in the semantic model, but Edit tables shows original source names
  • Source Renames: If upstream renames a table/column, the model looks for old name β†’ table disappears after sync
  • Renamed Tables: Show as unchecked in Edit tables β†’ must re-add them
  • Preservation: Measures can be moved, but relationships and column properties must be reapplied

Opening Edit Tables

Web Modeling (Browser):

  • Ribbon button β†’ Edit tables
  • Right-click table β†’ Edit tables
  • If disabled: Try selecting only one table

Power BI Desktop:

  • Right-click table β†’ Edit tables

Relationships in Direct Lake Mode

Creating and managing relationships in Direct Lake has some unique characteristics:

  • No Data Preview: Unlike Import/DirectQuery, no automatic data validation
  • Cardinality Guessing: Based on row counts (bigger table = “many” side)
  • Cross-Filter Direction: Always single by default (can change manually)
  • Validation Methods: Use DAX queries in DAX view or create visuals using both tables

Creating Relationships

  • Drag & Drop: Column β†’ column in Model view
  • Manage Relationships: Ribbon button
  • New Relationship: Data pane β†’ Model Explorer

Editing Relationships

  • Select relationship line in diagram β†’ edit in Properties pane
  • Double-click line β†’ open relationship editor

Limitations to Note

  • ❌ No data preview in relationship dialog
  • ❌ No auto validation of cardinality
  • ❌ No auto validation of cross-filter direction
  • βœ… Full control over syncing schemas and renaming
  • βœ… Ability to create relationships manually

Summary: Edit tables in Direct Lake gives you complete control over which tables/views are in your semantic model, with full schema sync and relationship management capabilitiesβ€”just without the automatic previews and validations of Import/DirectQuery.

Creating a Lakehouse for Direct Lake

Let’s walk through the complete process of setting up a Lakehouse, adding Delta tables, and building a Direct Lake semantic model in Microsoft Fabric.

Prerequisites: Before starting, ensure you understand Direct Lake basics and have appropriate Fabric permissions. Direct Lake works best with Delta tables using V-Order compression.

Step 1: Create a Lakehouse

  1. In your Microsoft Fabric workspace, navigate to: New > More options > Data Engineering > Lakehouse
  2. In the New lakehouse dialog:
    • Enter a name (letters, numbers, underscores only)
    • Click Create
  3. Verify the lakehouse is created and opens successfully

Step 2: Create Delta Tables with V-Order Compression

Direct Lake requires Delta tables. Here’s how to create optimized tables:

Best Practice: Always use V-ORDER compression for faster Direct Lake performance.

  1. In your lakehouse: Select Open notebook > New notebook
  2. Copy and run this code to access Azure storage:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"
blob_sas_token = r"""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
    'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
    blob_sas_token
)
print('Remote blob path: ' + wasbs_path)
    

Expected: Output showing remote blob path

  1. Read Parquet file into DataFrame:
df = spark.read.parquet(wasbs_path)
print(df.printSchema())
    

Expected: Schema of the data displayed

  1. Save as Delta table with V-ORDER compression:
# Save as delta table with V-ORDER compression
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
df.write.format("delta").saveAsTable("holidays")
    

Verification: Wait for SPARK jobs to complete, then refresh the Tables node to confirm your table exists.

Step 3: Create Your Direct Lake Semantic Model

  1. In your lakehouse: Select New semantic model
  2. In the dialog:
    • Choose which tables to include
    • Select Confirm
  3. A Direct Lake model is automatically created (same name as lakehouse)
  4. Open the data model in Power BI Web modeling:
    • Add relationships between tables
    • Add DAX measures for calculations

Next Steps After Creation

  • Build Reports: Create Power BI reports on your semantic model
  • Composite Models: Combine with other data sources if needed
  • XMLA Endpoints: Enable for programmatic access
  • Security: Implement RLS/OLS as required

Key Takeaways:
β€’ Lakehouse = Central storage for structured/unstructured data in Fabric
β€’ Delta tables = Required for Direct Lake, optimize with V-ORDER compression
β€’ Semantic model = Makes lakehouse data usable in Power BI
β€’ Direct Lake = Fast in-memory queries without data duplication

Developing Direct Lake Semantic Models

What is a Direct Lake Semantic Model?

A semantic model is Power BI’s data modeling layer that defines:

  • Tables and their relationships
  • Measures and calculations
  • Security rules (RLS/OLS)
  • Business logic and semantics

Direct Lake mode makes this layer high-performance by querying large data directly from OneLake without importing everything.

Creation Methods

You can create Direct Lake semantic models through multiple interfaces:

MethodBest ForNotes
Power BI DesktopPower users, developersFull modeling capabilities
Browser (Web Modeling)Quick edits, administratorsDirect from Lakehouse/Warehouse
SSMS (19.1+)Database professionalsVia XMLA endpoint
Fabric NotebooksProgrammatic creationAutomation, CI/CD
Community ToolsAdvanced scenariosTabular Editor, DAX Studio

Model Tables Best Practices

Critical: Avoid using views in Direct Lake modelsβ€”they cause queries to fall back to DirectQuery mode (slower performance).

Optimal table structure includes:

  • Essential Columns: For filtering, grouping, sorting, summarizing
  • Relationship Columns: Support table relationships
  • Avoid Unnecessary Columns: Don’t slow queries but increase storage
  • No Dynamic Data Masking (DDM): Not supported in Direct Lake

Supported Table Types

  • Direct Lake tables (from SQL analytics endpoint)
  • Import tables (stored on OneLake)
  • Calculated tables (but NOT referencing Direct Lake tables)
  • Calculation groups

Enforcing Data-Access Rules

Different users often need different data views. Direct Lake supports multiple security approaches:

1. Object-Level Security (OLS)

Purpose: Control visibility of tables/columns
Example: Hide “Salary” column from non-HR users
Setup: SQL analytics endpoint or semantic model (via Tabular Editor)
Caution: Can cause report errors in visuals

2. Row-Level Security (RLS)

Purpose: Control which rows users can see
Example: Salespeople only see their region’s sales
Setup Options:

  • SQL analytics endpoint (causes DirectQuery fallback)
  • Semantic model (web modeling or third-party tools)
Performance: Best in semantic model with fixed identity

How Queries Are Evaluated

Direct Lake follows this decision path for each query:

  1. If query hits restricted OLS column/table β†’ Error
  2. If restricted by SQL endpoint CLS β†’ Error
  3. If query uses a view or table with RLS at endpoint β†’ DirectQuery fallback
  4. If query exceeds memory limits β†’ DirectQuery fallback
  5. Otherwise β†’ In-memory cache (fastest)

Source Item Permissions

Access depends on connection type:

Connection TypeAccount UsedPermissions Needed
SSO (Default)Report consumer’s identityRead + ReadData on source item
Fixed IdentityShared service accountSingle account needs permissions

Security Implementation Options

Where Rules Are AppliedPerformance ImpactBest For
Semantic model onlyBest (in-memory)Most scenarios; use fixed identity
SQL endpoint onlySlower (DirectQuery fallback)When users also access warehouse directly
Both layersComplex, variableAdvanced security requirements

Best Practice: Implement RLS in the semantic model with fixed identity for optimal performance. Avoid OLS/CLS when possible as they can cause visualization errors.

XMLA Endpoint for Advanced Management

Direct Lake semantic models support XMLA write operations, enabling:

  • Customization & Scripting: Via SSMS, PowerShell, or custom tools
  • CI/CD Integration: With GitHub or Azure DevOps
  • Automation: Refresh schedules, model updates
  • Debugging: Advanced troubleshooting capabilities

Important: After creating new tables via XMLA, you must refresh the model. Otherwise, queries will fall back to DirectQuery.

Direct Lake Model Metadata

  • Compatibility Level: 1604
  • Partition Mode: directLake
  • Schema Source: SQL analytics endpoint
  • Data Source: OneLake (direct loading)

Managing Direct Lake Semantic Models

Post-Publication Checklist

After publishing a Direct Lake semantic model, complete these essential tasks:

  1. Cloud Connection Setup: Configure secure data connectivity
  2. Security Role Management: Define RLS membership
  3. Fabric Item Permissions: Assign Read/Build access
  4. Refresh Configuration: Set up data updates
  5. Data Discovery: Enable endorsement for visibility

Cloud Connection Types

πŸ” Default Cloud Connection

Authentication: SSO (user’s identity)
Best For: When all users have direct data access
Permissions: Users need Read/ReadData on source

🏒 Sharable Cloud Connection

Authentication: Fixed identity (service principal)
Best For: Centralized security management
Permissions: Single account needs access

Authentication Methods

  • OAuth 2.0: User account-based (not recommended for production)
  • Service Principal: Recommended for enterprise:
    • Secure, not tied to individual users
    • Supports secret/certificate rotation
    • Centralized management

Permission Requirements

User ScenarioPermission NeededNotes
View reports onlyRead (report + semantic model)Basic consumption
Create new reportsBuild (semantic model)Self-service BI
Manage model (refresh, settings)OwnerAdministration
Query with fixed identityNo lakehouse permission neededService account has access
Query with SSORead/ReadData on lakehouseUser needs direct access

Testing Tip: Always test permissions thoroughly before deploying to production users.

Refresh Strategies for Direct Lake

Refreshing updates your model with new data from source Delta tables:

Refresh Methods

  • Manual Refresh: Via Fabric portal or SSMS scripts
  • Scheduled Refresh: Automatic at defined intervals
  • Programmatic Refresh: Triggered via API or ETL pipelines
  • Automatic Updates: Enabled by default; detects Delta table changes

Pro Tip: Disable automatic updates if you need ETL jobs to complete before exposing new data to users.

Cache Management

After refresh, memory cache may be cleared β†’ initial queries can be slow.

Cache Warming Strategy: Run critical queries immediately after refresh to load important data into memory. Only needed if query delays are unacceptable for users.

Direct Lake Behavior Property

Control how queries are processed:

SettingBehaviorUse Case
Automatic (Default)Falls back to DirectQuery if neededGeneral use, ensures queries work
DirectLakeOnlyStrict in-memory only; no fallbackPerformance-critical scenarios
DirectQueryOnlyAlways uses DirectQueryTesting, troubleshooting

Recommendation: Use DirectLakeOnly for production if you want guaranteed in-memory performance and have addressed all potential fallback causes.

Specifying a Fixed Identity for Direct Lake

By default, Direct Lake uses SSO (user identity). For centralized security, you can configure a fixed identity:

Step-by-Step Configuration

  1. Access Model Settings:
    • Open your Direct Lake model settings
    • Expand “Gateway and cloud connections”
    • Note the SQL Server data source pointing to your lakehouse/warehouse
  2. Create New Connection:
    • In “Maps to” listbox, click “Create a connection”
    • New connection pane opens with pre-filled details
    • Enter meaningful name (e.g., “FixedIdentityConnection”)
  3. Set Authentication Method:
    • Choose: OAuth 2.0 (user account) or Service Principal (recommended)
    • Enter credentials for the fixed identity
  4. Disable SSO:
    • Go to Single sign-on section
    • Ensure “SSO via Microsoft Entra ID for DirectQuery queries” is NOT selected
  5. Complete Setup:
    • Configure any additional parameters
    • Click “Create”
  6. Verify Configuration:
    • Confirm data source now maps to non-SSO connection
    • Test model connectivity

Quick Recap: Model Settings β†’ Gateway and cloud connections β†’ Create connection β†’ Choose authentication β†’ Disable SSO β†’ Create & Verify.

Building Reports with Direct Lake Semantic Models

Understanding the Semantic Model Role

Think of the semantic model as the “brain” of your Power BI reports:

  • Organizes data for reporting, exploration, and DAX queries
  • With Direct Lake, reads directly from OneLake Delta tables
  • Maintains cache for recently used data
  • Automatically falls back to DirectQuery if needed

Creating Reports in Power BI Desktop

  1. Open Power BI Desktop
  2. Select: OneLake catalog or Get data β†’ Power BI semantic models
  3. Choose your Direct Lake semantic model β†’ Connect
  4. You’re now live-connected β†’ start building visuals
  5. Save locally β†’ Publish to workspace to share

Note: You can create additional measures in your report without modifying the underlying semantic model.

Creating Reports in Power BI Service / Fabric Portal

Multiple browser-based options:

  • Semantic model menu (…) β†’ Create report
  • Home β†’ New report β†’ Pick published semantic model
  • OneLake catalog β†’ Data & Semantic model β†’ Create report
  • Semantic model details page β†’ Create blank report
  • Web modeling (File β†’ Create new report)

Additional Reporting Options

πŸ” Explore

Quick, interactive analysis

πŸ“„ Paginated Reports

Pixel-perfect, printable reports

πŸ“Š DAX Queries

Advanced calculations

πŸ“ˆ Excel + Power BI

Refreshable PivotTables

Consumption Scenarios and Permissions

Scenario 1: View Report Only

Approach: Use Fixed Identity
Options:

  • Publish as App (report-only permissions)
  • Give Viewer role on report + semantic model
  • Separate workspace with Viewer role

Scenario 2: View + Create Own Reports

Approach: Use Fixed Identity
Permissions: Build permission on semantic model + Viewer on report

Scenario 3: View + Create + SQL Endpoint + Delta Access

Approach: Use SSO
Permissions Required:

  • Build access on semantic model
  • ReadAll permission on Fabric item
  • Direct lakehouse/warehouse access

Scenario 4: View + Edit Everything

Approach: Use SSO
Permissions: Higher workspace roles (Contributor, Member, Admin)

Analogy:
β€’ Semantic model = Kitchen (where food is prepared)
β€’ Reports = Meals served (visuals you consume)
β€’ Permissions = Access rules (who can eat, cook, or change recipes)

Final Key Takeaways

Direct Lake: The Future of Power BI at Scale

  • βœ… Performance: Import-mode speed with DirectQuery freshness
  • βœ… Scalability: Works with datasets larger than memory
  • βœ… Efficiency: No full data copies, lightweight refreshes
  • βœ… Freshness: Near real-time data synchronization
  • βœ… Cost-Effective: Reduced storage and compute requirements
  • βœ… Enterprise-Ready: Built for Fabric/OneLake ecosystems

Direct Lake represents a fundamental shift in how Power BI handles large datasets, combining the best aspects of traditional storage modes while eliminating their limitations. For organizations investing in Microsoft Fabric and OneLake, Direct Lake is the clear path forward for scalable, high-performance business intelligence.

Getting Started Recommendation: Begin with a pilot project to validate Direct Lake fits your use case. Focus on Delta table optimization, monitor for DirectQuery fallback, and gradually expand as you gain experience.