
Data Engineering Design Patterns
Amit Kulkarni
This audiobook is narrated by a digital voice.
DESCRIPTION
Data engineering has gained even more relevance than before, and data engineering patterns are key to the successful implementation of data engineering projects. This book enables a data...
Location:
United States
Description:
This audiobook is narrated by a digital voice. DESCRIPTION Data engineering has gained even more relevance than before, and data engineering patterns are key to the successful implementation of data engineering projects. This book enables a data engineer to not only become familiar with data engineering patterns but also understand their application in real world use cases. This book presents a comprehensive collection of data engineering patterns, each illustrated with relevant enterprise use cases to highlight their value and simplicity. It showcases both open-source and cloud technologies, guiding readers in building data systems for on-premise and cloud environments. The book covers patterns for data ingestion, transformation, storage, and serving, while also offering insights into performance engineering for data pipelines. Once we understand fundamental data engineering patterns, we then shift focus to patterns that help us build high-performance low latency data systems. We cover data caching, partitioning, replication, and how to select the technology stack for building out the patterns in this book. By the end of the book, readers will have a deep understanding of various data engineering use cases and will be able to map the appropriate patterns to address them. They will also be equipped to choose the right technical stack for implementing these patterns, enabling them to create robust and efficient data systems in a secure and a cost-effective manner. WHAT YOU WILL LEARN ● Key data engineering patterns. ● Data ingestion and processing patterns. ● Modern architectures like Lambda. ● Explore time-tested data patterns of ETL and ELT. ● Modern data systems like data lake and medallion architectures. ● Domain-specific patterns and also on data orchestration, observability, and security. ● Overcoming performance challenges in building complex data systems. Duration - 11h 58m. Author - Amit Kulkarni. Narrator - Digital Voice Madison G. Published Date - Friday, 31 January 2025. Copyright - © 2026 BPB ©.
Language:
English
Title Page
Duration:00:00:16
Copyright Page
Duration:00:01:21
Dedication Page
Duration:00:00:22
About the Authors
Duration:00:01:34
About the Reviewers
Duration:00:03:09
Acknowledgements
Duration:00:00:45
Preface
Duration:00:22:52
Table of Contents
Duration:00:17:21
1. Understanding Data Engineering
Duration:00:00:04
Introduction
Duration:00:00:37
Structure
Duration:00:00:15
Objectives
Duration:00:00:21
Data engineering’s role in modern data systems
Duration:00:01:47
Core concepts of data engineering
Duration:00:01:07
Data processing and ingestion
Duration:00:01:19
Data storage and serving
Duration:00:01:58
Data orchestration and governance
Duration:00:01:35
Lifecycle of data
Duration:00:02:17
Conclusion
Duration:00:00:40
Questions
Duration:00:00:32
2. Data Engineering Patterns, Terminologies, and Technical Stack
Duration:00:00:06
Understanding data engineering patterns
Duration:00:00:32
Importance of data engineering patterns
Duration:00:01:13
Examples of data engineering patterns
Duration:00:00:42
Real-time data ingestion
Duration:00:01:58
Caching
Duration:00:01:41
Effective use of patterns
Duration:00:02:28
Data processing and ingestion patterns
Duration:00:00:27
Batch ingestion and processing
Duration:00:02:43
Real-time ingestion and processing
Duration:00:02:53
Micro-batching
Duration:00:00:51
Lambda architecture
Duration:00:02:04
ETL and ELT
Duration:00:02:41
Data storage and processing patterns
Duration:00:00:38
Databases and transactional data
Duration:00:02:08
Data warehouse for data analytics
Duration:00:01:37
Data lake and medallion architecture
Duration:00:02:05
Data replication and partitioning
Duration:00:01:54
Hot vs. cold storage
Duration:00:01:03
Data caching and low-latency serving
Duration:00:01:14
Data search patterns
Duration:00:00:56
Domain specific patterns
Duration:00:01:22
Miscellaneous patterns
Duration:00:00:30
Data security patterns
Duration:00:02:12
Data observability and monitoring patterns
Duration:00:01:33
Idempotency and deduplication patterns
Duration:00:02:47
Data orchestration patterns
Duration:00:00:50
3. Batch Ingestion and Processing
Duration:00:00:04
Use cases for batch systems
Duration:00:00:15
ETL pipelines for a data warehouse
Duration:00:02:08
Data archival pipelines
Duration:00:01:35
Building precomputed aggregates for BI
Duration:00:02:02
Training ML models
Duration:00:00:44
Designing batch processing and ingestion system
Duration:00:05:56
Technologies for batch systems
Duration:00:01:39
Real-world examples
Duration:00:00:24
Batch processing in banking
Duration:00:01:21
Batch processing in retail media networks
Duration:00:01:21
4. Real-time Ingestion and Processing
Duration:00:00:04
Use cases for real-time systems
Duration:00:00:37
Pipelines for real-time analytics
Duration:00:02:38
Change data capture for high availability
Duration:00:02:26
Real-time ML scoring
Duration:00:03:05
Designing a real-time system
Duration:00:04:00
Technologies for real-time systems
Duration:00:01:56
Payment fraud detection
Duration:00:01:56
Gaming
Duration:00:01:40
5. Micro-batching
Duration:00:00:03
Use cases for micro-batching
Duration:00:00:32
Data ingestion into data lake
Duration:00:04:43
Near real-time data analysis
Duration:00:01:58
Data quality validations
Duration:00:02:27
Designing micro-batching system
Duration:00:06:18
Technologies for micro-batching systems
Duration:00:03:12
Vehicle tracking in logistics
Duration:00:02:25
IoT
Duration:00:03:06
6. Lambda Architecture
Duration:00:00:04
Use cases for Lambda architecture pattern
Duration:00:00:33
Machine learning model creation and scoring
Duration:00:04:04
Real-time data analysis with historical bias
Duration:00:04:19
Designing system with a Lambda pattern
Duration:00:00:20
Speed layer
Duration:00:00:41
Batch layer
Duration:00:06:03
Serving layer
Duration:00:00:59
Technologies for Lambda systems
Duration:00:03:57
Fintech
Duration:00:03:30
Kafka setup instructions
Duration:00:03:49
7. ETL and ELT
Duration:00:00:04
Use cases for ETL and ELT patterns
Duration:00:00:44
ETL in data warehousing
Duration:00:02:33
ELT in clickstream analysis
Duration:00:02:21
Designing ETL and ELT system
Duration:00:00:51
Forward population using ELT
Duration:00:04:50
Backward population using ETL
Duration:00:06:44
Technologies for ETL and ELT systems
Duration:00:03:52
Banking
Duration:00:03:12
8. Data Fundamentals
Duration:00:00:04
E-commerce application example
Duration:00:02:28
Overview of data modeling
Duration:00:02:24
Structured data and tabular data representation
Duration:00:01:42
Semi-structured data and JSON data format
Duration:00:01:46
JSON data format
Duration:00:01:28
Structured vs. semi-structured data model
Duration:00:03:40
Unstructured data and binary data format
Duration:00:01:28
Transactional and analytical data
Duration:00:02:36
Exercises
Duration:00:00:38
9. Databases and Transactional Data
Duration:00:00:05
Understanding relational databases
Duration:00:00:51
Introduction to distributed databases
Duration:00:01:09
Database views
Duration:00:03:56
Primary and secondary indexes
Duration:00:00:38
Primary indexes
Duration:00:01:31
Secondary indexes
Duration:00:02:53
Importance of index key order in secondary indexes
Duration:00:03:49
ACID transactions in traditional RDBMS
Duration:00:05:41
Transactions in distributed databases
Duration:00:02:51
Durability in MongoDB
Duration:00:01:56
Write to majority with journaling enabled
Duration:00:01:01
Write to all replica sets
Duration:00:01:11
Eventual consistency in DynamoDB
Duration:00:02:33
10. Data Warehouse and Data Analytics
Duration:00:00:05
Data analytics and business intelligence
Duration:00:03:05
Data warehouse
Duration:00:02:54
Differences between database and data warehouse
Duration:00:00:20
Types of data workload
Duration:00:01:16
Data serving latency
Duration:00:01:05
Recent data vs. historical data
Duration:00:01:24
Raw data vs. filtered and processed data
Duration:00:01:23
Data storage format
Duration:00:01:26
Database vs. data warehouse
Duration:00:00:30
Features of data warehouse
Duration:00:00:23
Materializes views
Duration:00:04:53
Refreshing materialised views
Duration:00:01:33
Database views vs. materialized views
Duration:00:00:29
Columnar storage format
Duration:00:02:26
Example of row-oriented and columnar storage formats
Duration:00:02:52
Star schema and Snowflake schema
Duration:00:03:05
Choice between star and Snowflake schemas
Duration:00:00:41
11. Data Lake and Medallion Architecture
Duration:00:00:05
Travel aggregator example
Duration:00:02:44
Differences between data warehouse and data lake
Duration:00:00:20
Data lake architecture
Duration:00:02:28
Organizing data in data lake
Duration:00:06:59
Use of extract-load-transform pattern
Duration:00:06:36
Medallion architecture
Duration:00:00:55
Transforming data from bronze layer
Duration:00:05:51
Transforming data from silver layer
Duration:00:07:03
Putting it all together
Duration:00:01:14
Benefits of medallion architecture
Duration:00:00:15
Separation of concerns
Duration:00:00:48
Reusability of data pipeline
Duration:00:01:29
Importance of bronze layer in medallion architecture
Duration:00:01:13
12. Data Replication and Partitioning
Duration:00:00:05
Faults and fault tolerance
Duration:00:02:45
Basics of data replication
Duration:00:03:03
Types of data replication
Duration:00:05:56
Configuring more than one replica
Duration:00:02:38
Reading the data from the replicas
Duration:00:02:11
Cross datacenter replication
Duration:00:02:50
Bi-directional XDCR and conflict resolution
Duration:00:02:04
Data partitioning
Duration:00:01:27
Hash partitioning
Duration:00:02:08
Range partitioning
Duration:00:02:45
Other popular partitioning schemes
Duration:00:01:02
Scatter and gather operations
Duration:00:02:09
13. Hot Versus Cold Data Storage
Duration:00:00:04
Identifying hot, warm, and cold data
Duration:00:00:40
Data access frequency
Duration:00:01:49
Data recency
Duration:00:01:06
Visualizing hot, warm, and cold data segregation
Duration:00:03:13
Introduction to data caching
Duration:00:01:48
Data archival
Duration:00:02:47
Defining data lifecycle using AWS S3
Duration:00:04:03
Accessing archived data
Duration:00:02:17
Comparing storage classes
Duration:00:00:15
14. Data Caching and Low Latency Serving
Duration:00:00:05
Online movie database
Duration:00:01:41
User authentication service
Duration:00:02:32
Populating the cache
Duration:00:01:19
Using local Memcached for caching
Duration:00:02:05
Reading from the cache
Duration:00:02:21
Quality of data caching and cache eviction policies
Duration:00:03:01
Cache staleness, invalidation, and expiry
Duration:00:00:50
Cache invalidation or cleanup
Duration:00:01:51
Cache expiry
Duration:00:01:10
Caching of pre-processed data
Duration:00:01:00
Data prefetching
Duration:00:03:40
Caching on laptops and mobile device
Duration:00:01:49
15. Data Search Patterns
Duration:00:00:04
Full text search
Duration:00:03:46
Benefits of pre-processing
Duration:00:00:58
Full text search example
Duration:00:03:35
Advanced features of full text search
Duration:00:03:49
Vector search
Duration:00:01:25
Introduction to vector
Duration:00:00:43
Vector similarity search
Duration:00:01:19
Vector databases and vector indexes
Duration:00:00:58
Using vector database
Duration:00:01:10
Vector search example
Duration:00:02:06
Quality of vector search results
Duration:00:01:02