Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, alignment with participant profiles, and success criteria
- High-level migration approaches and risk considerations
- Setting up workspaces, repositories, and lab datasets
Day 1 — Migration Fundamentals and Architecture
- Lakehouse concepts, Delta Lake overview, and Databricks architecture
- Differences between SMP and MPP and their implications for migration
- Medallion (Bronze→Silver→Gold) design and Unity Catalog overview
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure to a notebook
- Mapping temp tables and cursors to DataFrame transformations
- Validation and comparison with original output
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel features
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution
- OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning
Day 2 Lab — Incremental Ingestion & Optimization
- Implementing Auto Loader ingestion and MERGE workflows
- Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results
- Measuring read/write performance improvements
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, JSON/array handling
- Reading the Spark UI, DAGs, shuffles, stages, tasks, and bottleneck diagnosis
- Query tuning patterns: broadcast joins, hints, caching, and spill reduction
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactor a heavy SQL process into optimized Spark SQL
- Use Spark UI traces to identify and fix skew and shuffle issues
- Benchmark before/after and document tuning steps
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
- Transforming loops and cursors into vectorized DataFrame operations
- Modularization, UDFs/pandas UDFs, widgets, and reusable libraries
Day 4 Lab — Refactoring Procedural Scripts
- Refactor a procedural ETL script into modular PySpark notebooks
- Introduce parametrization, unit-style tests, and reusable functions
- Code review and best-practice checklist application
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling
- Designing incremental Medallion pipelines with quality rules and schema validation
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assemble Bronze→Silver→Gold pipeline orchestrated with Workflows
- Implement logging, auditing, retries, and automated validations
- Run full pipeline, validate outputs, and prepare deployment notes
Operationalization, Governance, and Production Readiness
- Unity Catalog governance, lineage, and access controls best practices
- Cost, cluster sizing, autoscaling, and job concurrency patterns
- Deployment checklists, rollback strategies, and runbook creation
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned
- Gap analysis, recommended follow-up activities, and training materials handoff
- References, further learning paths, and support options
Requirements
- A solid understanding of data engineering concepts
- Experience with SQL and stored procedures (Synapse / SQL Server)
- Familiarity with ETL orchestration concepts (ADF or similar tools)
Target Audience
- Technology managers with a data engineering background
- Data engineers looking to transition procedural OLAP logic to Lakehouse patterns
- Platform engineers responsible for driving Databricks adoption
35 Hours