Microsoft Fabric Decision Guide

Choosing the Right Data Store and Integration Strategy

A comprehensive guide to selecting the optimal data storage, movement, and transformation approaches in Microsoft Fabric’s unified OneLake environment

Introduction to Microsoft Fabric Data Stores

Microsoft Fabric provides a unified analytics platform with multiple data store options, all integrated within OneLake. Your choice depends on three key factors: data type, specific use case, and team skillsets. This guide will help you navigate these options and make informed decisions for your organization.

OneLake: The Unified Foundation

All Fabric data stores leverage OneLake as the underlying storage layer, providing a single, unified location for all your data. This eliminates data silos and simplifies data management across different workloads.

Fabric Data Stores: Overview and Ideal Use Cases

Use Case / WorkloadFabric Data StoreData in OneLakePrimary Users
Streaming event data (time-series, JSON, activity logs)Eventhouse✓ YesApp developers, Data scientists, Data engineers
AI, NoSQL, vector searchCosmos DB in Fabric (Preview)✓ YesAI developers, App developers
Transactional/OLTP databaseSQL Database in Fabric (Preview)✓ YesAI developers, App developers, DB developers
Enterprise Data Warehouse (BI, OLAP, full SQL support)Fabric Data Warehouse✓ YesData warehouse developers, Architects, Engineers
Big data, ML, un/semi/structured dataLakehouse✓ YesData engineers, Data scientists

Quick Decision Guide

Streaming, logs, telemetry

→ Choose Eventhouse

🤖

AI/NoSQL/Vector search

→ Choose Cosmos DB (Preview)

💳

Transactional apps (OLTP)

→ Choose SQL Database (Preview)

📊

SQL-based analytics/BI

→ Choose Data Warehouse

📁

Big data, ML, flexible formats

→ Choose Lakehouse

Personas and Skillsets: Matching Tools to Your Team

Eventhouse Users

App Dev, Data Scientist, Data Engineer

Skills/Tools: No code, KQL, SQL

Best for: Real-time analytics, time-series data, log analysis

Cosmos DB Users

AI Dev, App Dev

Skills/Tools: NoSQL, REST APIs, JavaScript

Best for: AI applications, vector search, document databases

SQL Database Users

AI Dev, App Dev, DB Dev/Admin

Skills/Tools: SQL DB concepts, SSMS, VS Code, T-SQL

Best for: Transactional applications, ACID compliance

Data Warehouse Users

Data Warehouse Dev, Architect, Engineer

Skills/Tools: Data warehousing, star schema, T-SQL

Best for: Enterprise BI, structured analytics

Lakehouse Users

Data Engineer, Data Scientist

Skills/Tools: PySpark, Notebooks, Delta Lake, Spark SQL

Best for: Big data processing, machine learning, flexible data formats

Real-World Scenarios: Choosing the Right Data Store

Scenario 1: Susan (Data Warehouse Specialist)

Background: SQL expert, team uses SQL tools exclusively

Needs: Multi-table transactions, SQL-based analytics, enterprise reporting

Choice: Fabric Data Warehouse

Why it works: Uses T-SQL, supports cross-database queries (including Lakehouse), and provides easy transition from SQL Server with familiar tools (SSMS, VS Code).

Key Benefit: Full ACID compliance and enterprise-grade performance for structured BI analytics.

Scenario 2: Rob (Data Engineer)

Background: Data engineer with team skilled in both PySpark and T-SQL

Needs: Handle terabytes of data, flexible data formats, big data processing

Choice: Lakehouse

Why it works: Data engineers can use Spark for processing while business users query with T-SQL through the SQL endpoint.

Key Benefit: Supports both structured & unstructured data, ideal for big data and ML workloads.

Scenario 3: Daisy (Business Analyst)

Background: Business analyst, Power BI user

Needs: Analyze billions of rows, time-series & geospatial analytics, fast dashboard response

Choice: Eventhouse

Why it works: Scalable, real-time queries, integrates seamlessly with Power BI (Direct Query mode).

Key Benefit: Advanced analytics (time-series, geo-spatial) with fast problem detection capabilities.

Scenario 4: Kirby (Application Architect)

Background: App architect specializing in .NET and transactional systems

Needs: High concurrency, ACID compliance, relational integrity, performance tuning

Choice: SQL Database in Fabric

Why it works: Full OLTP features with auto-scaling, transaction isolation levels, and auto index tuning.

Key Benefit: Seamless integration with other Fabric workloads (Spark, Warehouse, Eventhouse) and ability to import existing SQL data without transformation.

Warehouse vs. Lakehouse: Detailed Comparison

Microsoft Fabric provides two enterprise-scale storage options, both using Delta format in OneLake and included with Power BI Premium/Fabric capacities.

Decision Criteria: When to Choose Which

  • Development Approach: Spark → Lakehouse | T-SQL → Warehouse
  • Warehousing Needs: Need multi-table transactions? → Warehouse | Don’t need transactions → Lakehouse
  • Data Complexity: Unknown data type → Lakehouse | Unstructured + Structured → Lakehouse | Structured only → Warehouse

Fabric Data Warehouse: Enterprise Structured Analytics

Best for: Enterprise-scale structured data and BI reporting

Key Warehouse Features

  • ACID compliant, multi-table transactions
  • No setup/configuration (“no knobs” performance)
  • Intuitive UI (no-code/low-code/T-SQL)
  • Data stored in OneLake (Delta format)
  • Full SQL support (DQL, DML, DDL)
  • Virtual warehouses, cross-database querying
  • Governance, security, enterprise-grade performance

Lakehouse: Flexible Analytics on All Data Types

Best for: Flexible analytics on structured + unstructured data

Key Lakehouse Features

  • Stores all data types (structured/unstructured) in one location
  • Scales to large volumes efficiently
  • Unified Delta format ingestion
  • Automatic table discovery & file-to-table conversion
  • SQL analytics endpoint (read-only, limited SQL)
  • Ideal for data engineers, exploration, staging, and medallion architecture

Warehouse vs. Lakehouse SQL Endpoint Comparison

AspectWarehouseLakehouse SQL Endpoint
CapabilitiesFull ACID transactions, enterprise data warehouseRead-only SQL endpoint for Lakehouse
Developer ProfileSQL developers, citizen developersData engineers, SQL developers
Data LoadingSQL, pipelines, dataflowsSpark, pipelines, dataflows, shortcuts
Delta SupportRead/write Delta tablesRead-only Delta tables
StorageDelta format in OneLakeDelta format in OneLake
Use CaseEnterprise BI, structured SQL analyticsExploring/querying Delta, staging/archive, medallion architecture
Development ExperienceFull T-SQL UI, rich tooling supportLimited T-SQL (views, TVFs, queries only)
T-SQL CapabilitiesFull DQL, DML, DDLFull DQL, no DML, limited DDL

Quick Guidance

  • Choose Warehouse if → You need transactions, structured BI analytics, or SQL-first development.
  • Choose Lakehouse if → You need flexibility, support for unstructured data, or Spark-first workloads.
  • Best Practice → Use both together: Lakehouse for ingestion + exploration, Warehouse for enterprise reporting.

Data Movement Strategies in Microsoft Fabric

Microsoft Fabric provides three main approaches to move data into your analytics environment, each with different capabilities and complexity levels.

🔄

Mirroring

Simplicity: Very simple, free

Best for: Real-time analytics, continuous replication

Key Feature: CDC continuous replication to OneLake

📦

Copy Job

Simplicity: Easy + advanced options

Best for: Data migration, consolidation, backups

Key Feature: Batch + incremental copying

⚙️

Copy Activity (Pipelines)

Simplicity: Complex but powerful

Best for: Complex ETL/ELT with transformations

Key Feature: Full pipeline orchestration

Detailed Feature Comparison

Feature / OptionMirroringCopy JobCopy Activity
SourcesDatabases + Open MirroringAll sources & formatsAll sources
DestinationsOneLake (read-only tabular)All destinationsAll destinations
Ease of UseVery simpleEasy + advancedComplex but powerful
Custom Scheduling✗ Not available✓ Available✓ Available
Table & Column Management✗ Not available✓ Available✓ Available
Append/Upsert/Override✗ Not available✓ Available✓ Available
Continuous Replication (CDC)✓ Available✓ Available✗ Not available
Batch / Incremental Copy✗ Not available✓ Available✓ Available
Custom SQL Queries✗ Not available✓ Available✓ Available
CostFreePaidPaid
Use CasesReal-time analyticsData migration, consolidation, backupsComplex ETL workflows

Data Movement Scenarios

Scenario 1: James (Finance Manager)

Needs: Real-time dashboards from Azure SQL Database without impacting production

Challenge: Requires automatic replication with minimal setup and no cost

Solution: Mirroring

Why it works: CDC continuous replication provides read-only data in OneLake without affecting operational systems. Free and simple to set up.

Scenario 2: Lisa (Business Analyst)

Needs: Copy shipment data from Snowflake with custom scheduling and upsert handling

Challenge: Requires incremental + batch loads every 4 hours with table/column mapping

Solution: Copy Job

Why it works: Flexible, easy setup with advanced options for scheduling and data management. Supports watermark-based incremental copying.

Scenario 3: David (Data Engineer)

Needs: Complex ETL from Oracle with transformations and multiple destinations

Challenge: Requires custom SQL queries, dependency handling, and comprehensive monitoring

Solution: Copy Activity in Pipelines

Why it works: Full pipeline-based orchestration with transformations, custom SQL, and advanced workflow capabilities.

Simplified Rule of Thumb

  • Use Mirroring if you want simple, real-time replication for analytics
  • Use Copy Job if you need flexibility (scheduling, incremental, upsert) without full pipelines
  • Use Copy Activity in Pipelines if you need complex orchestration, transformations, and advanced workflows

Data Integration and Transformation Tools

Beyond data movement, Fabric provides tools for orchestration and transformation. Your choice depends on your goal, skillset, workload type, and transformation complexity.

Orchestration Strategies

MethodUse CaseBest ForSkill LevelCode Requirement
PipelinesLow-code orchestrationWorkflows, schedulingIntegrator, AnalystNo/Low code
Apache Airflow JobsCode-first orchestrationPython DAGs, DevOps workflowsPython usersCode-first

Transformation Strategies

MethodUse CaseBest ForSkill LevelCode RequirementTransformation Power
NotebooksCode-first data prepComplex ML/analyticsData ScientistCode-firstHigh
Dataflow Gen 2No-code data prepData cleaning, profilingAnalyst/EngineerNo/Low codeHigh

Data Integration Tool Selection Guide

ToolBest ForDeveloper PersonaKey Strengths
Copy ActivityData migration, ingestion, lightweight transformationsData engineer, integratorPetabyte-scale ingestion, scheduled, minimal coding, many connectors
Copy JobIncremental copy, CDC, replication, migrationBusiness analyst, engineer, integratorWizard-driven, initial full load + incremental updates, simple scheduling
Dataflow Gen 2Ingestion + heavy transformation, data wranglingEngineers, integrators, analysts300+ transformations, multiple outputs, Power Query experience
EventstreamReal-time event ingestion + routingEngineer, scientist, developerWorks with CDC, Kafka, pub/sub; medium–high volumes; no schedule needed
SparkAdvanced transformation, processing at scaleData engineer, integratorSupports all data types; full coding control; custom transformations

Decision Cheat-Sheet

  • Need simple migration/ingestionCopy activity
  • Incremental CDC ingestionCopy job
  • Transformations & wranglingDataflow
  • Real-time streaming dataEventstream
  • Advanced analytics & codingSpark

Integration Scenarios and Solutions

Hanna (DBA, Finance)

Near real-time SQL replication

Solution: ✔️ Mirroring

Why: Simple, free, continuous replication to OneLake for real-time analytics

Charlie (Retail)

CDC + bulk migration

Solution: ✔️ Copy Job

Why: Flexible, supports both batch and incremental copying with CDC

Rukmina (Manufacturing)

Medallion migration, high volume

Solution: Copy Activity in Pipelines

Why: Complex ETL/ELT with transformations and multiple steps

Julian (Analyst)

Workflow orchestration, low-code

Solution: Pipelines

Why: Visual workflow creation, scheduling, no-code approach

Darshan (Data Scientist)

Python, ML workflows

Solution: Apache Airflow Jobs

Why: Code-first orchestration with Python DAGs

René (Researcher)

Large datasets, ML, code-first

Solution: Notebooks

Why: Spark-based code for complex data preparation and ML

Ako (Healthcare Analyst)

Data prep, Power Query style

Solution: Dataflow Gen 2

Why: No-code Power Query interface for data cleaning and transformation

Getting Started: Next Steps

Implementation Roadmap

  • Mirroring → Set up replication to OneLake (Delta tables) for real-time analytics
  • Copy Job → Use wizard-driven setup for bulk + CDC data ingestion
  • Copy Activity → Build complex ingestion workflows in Pipelines
  • Pipelines → Create visual workflows for orchestration and scheduling
  • Apache Airflow Jobs → Author Python DAGs for code-first orchestration
  • Notebooks → Write Spark-based code for advanced data preparation
  • Dataflow Gen 2 → Use Power Query-style no-code interface for transformations

Quick Implementation Tips

  • Use Mirroring for real-time replication from operational databases
  • Use Copy Job / Copy Activity for batch or incremental data ingestion
  • Use Pipelines / Airflow for workflow orchestration and scheduling
  • Use Notebooks / Dataflow Gen 2 for data transformation and preparation
  • Consider using Lakehouse and Warehouse together for comprehensive analytics
  • Leverage OneLake as your single source of truth across all Fabric workloads

Conclusion

Microsoft Fabric provides a comprehensive, unified analytics platform with multiple data store options, all integrated through OneLake. Your optimal architecture depends on three key factors:

  1. Data Characteristics: Structured vs. unstructured, volume, velocity
  2. Use Case Requirements: Real-time analytics, transactions, BI reporting, ML
  3. Team Skillsets: SQL vs. Spark, code-first vs. low-code preferences

By following the decision guidelines in this post, you can select the right combination of Fabric data stores and integration tools for your specific needs. Remember that these components are designed to work together seamlessly, allowing you to build end-to-end analytics solutions that leverage the strengths of each option.

The key advantage of Microsoft Fabric is its unified approach—all data stores use OneLake, all tools integrate seamlessly, and all workloads can be managed from a single platform. This eliminates the complexity of managing multiple disjointed systems while providing the flexibility to choose the right tool for each job.

Final Recommendation

Start with your business requirements and user personas, then work backward to select the appropriate Fabric components. Most organizations will benefit from using multiple Fabric workloads together—for example, Lakehouse for data ingestion and exploration, Warehouse for enterprise reporting, and Eventhouse for real-time analytics. The unified nature of Fabric makes these combinations not only possible but practical and efficient.