
Building Multimodal Generative AI and Agentic Applications
Indrajit Kar
This audiobook is narrated by a digital voice.
DESCRIPTION
Generative AI and agentic AI are reshaping how we interact with data, enabling intelligent systems that can reason, generate, and autonomously act across multiple modalities. From text and...
Location:
United States
Description:
This audiobook is narrated by a digital voice. DESCRIPTION Generative AI and agentic AI are reshaping how we interact with data, enabling intelligent systems that can reason, generate, and autonomously act across multiple modalities. From text and images to voice and structured data, these technologies are increasingly essential in enterprise and research applications today. This book offers a complete roadmap to mastering multimodal generative AI and agentic AI systems. It covers foundational concepts, vision-language models, retrieval-augmented generation, human-in-the-loop and multi-agent workflows, text-to-SQL, OCR, and hybrid AI integrations. Each chapter combines theory, practical guidance, code implementations, and real-world case studies, helping readers understand architectures, pipelines, and production-grade deployments. By the end of this book, readers will be capable of designing, implementing, and scaling robust multimodal and agentic AI systems. They will gain hands-on expertise in reasoning, generation, retrieval, agent orchestration, and Ops, equipping them to build production-ready AI applications and excel in their roles. WHAT YOU WILL LEARN ● Understand multimodal generative AI and agentic AI systems. ● Architecting RAG, vector DBs, embeddings, cross-encoders, and core agentic planning. ● Build retrieval-augmented generation workflows efficiently. ● Implement human-in-the-loop and multi-agent pipelines. ● Apply text-to-SQL for real-time data queries. ● Develop OCR solutions for images and documents. ● Integrate traditional ML models with GenAI workflows. ● Deploy production-grade AI with monitoring and observability. Duration - 17h 22m. Author - Indrajit Kar. Narrator - Digital Voice Madison G. Published Date - Thursday, 02 January 2025. Copyright - © 2026 BPB ©.
Language:
English
Title Page
Duration:00:00:18
Copyright Page
Duration:00:01:21
Dedication Page
Duration:00:00:06
About the Author
Duration:00:02:26
About the Reviewers
Duration:00:03:25
Acknowledgement
Duration:00:00:43
Preface
Duration:00:15:39
Table of Contents
Duration:00:22:49
1. Introducing New Age Generative AI
Duration:00:00:05
Introduction
Duration:00:01:44
Structure
Duration:00:00:38
Objectives
Duration:00:00:52
Overview of generative AI
Duration:00:04:16
Retrieval system
Duration:00:02:30
Sparse retrieval
Duration:00:00:39
Dense retrieval
Duration:00:04:29
Generation system
Duration:00:02:11
Types of generation systems
Duration:00:02:51
Autoregressive generation
Duration:00:01:08
Prompting strategies
Duration:00:00:20
Understanding where generation systems excel
Duration:00:00:31
Combining retrieval and generation
Duration:00:00:41
Retrieval-augmented generation
Duration:00:00:54
RAG working
Duration:00:00:35
Architecture of a basic RAG pipeline
Duration:00:00:47
Types of RAG architectures
Duration:00:00:58
Iterative RAG
Duration:00:00:19
Vector databases and RAG
Duration:00:00:42
Prompt engineering for RAG
Duration:00:00:40
Advanced RAG techniques
Duration:00:01:12
Applications of RAG
Duration:00:00:46
Orchestration in AI systems
Duration:00:00:41
Orchestration in RAG systems
Duration:00:01:20
Orchestration in agentic systems
Duration:00:02:15
Tokens in AI systems
Duration:00:04:08
Vector database
Duration:00:02:58
Understanding vector databases
Duration:00:00:38
Indexing algorithms in vector databases
Duration:00:01:42
Search algorithms in vector databases
Duration:00:01:42
Embeddings and embedding models
Duration:00:01:10
Importance of vector databases for RAG and agentic systems
Duration:00:01:46
Reranking
Duration:00:01:12
Bi-encoders vs. cross-encoders
Duration:00:02:03
Cross-encoders for reranking
Duration:00:01:40
Guardrails
Duration:00:01:07
Types of guardrails
Duration:00:01:06
Methods of applying guardrails
Duration:00:00:48
Without guardrails
Duration:00:00:44
Industry examples of guardrail solutions
Duration:00:03:00
Agents
Duration:00:04:47
Agentic RAG vs. non-agentic RAG
Duration:00:01:44
Model Context Protocols
Duration:00:02:34
Conclusion
Duration:00:01:43
2. Deep Dive into Multimodal Systems
Duration:00:00:04
Understanding vision-language models
Duration:00:01:00
Categories of vision-language models
Duration:00:04:41
Core architectural components of vision-language models
Duration:00:05:36
Challenges in vision-language models
Duration:00:03:06
Multimodal GenAI system
Duration:00:06:41
Multimodal vector embedding
Duration:00:03:45
Multimodal vector database
Duration:00:01:50
Collections
Duration:00:00:43
Points and point IDs
Duration:00:00:42
Vectors
Duration:00:00:42
Payload
Duration:00:00:35
Storage and vector store
Duration:00:00:56
Indexing
Duration:00:05:01
Implementation comparisons
Duration:00:00:27
Single collection, partitioned via payload
Duration:00:01:41
Multiple collections with global indexing
Duration:00:02:23
Multimodal generative AI systems vs. VLMs
Duration:00:00:39
Vision-language models
Duration:00:01:24
Multimodal generative AI systems
Duration:00:01:44
Using vision-language models
Duration:00:01:02
Using multimodal generative AI systems
Duration:00:01:37
Real-world example comparison
Duration:00:00:52
Output-based classification of multimodal systems
Duration:00:01:12
Text-to-image systems
Duration:00:02:27
Image-to-text systems
Duration:00:01:45
Text and image systems
Duration:00:02:20
Text-only to specifications and image systems
Duration:00:02:28
Text-to-SQL systems
Duration:00:01:58
Text-to-code systems
Duration:00:03:42
3. Implementing Unimodal Local GenAI System
Duration:00:00:05
GPU in today’s generative AI systems
Duration:00:04:12
Using a local GPU
Duration:00:06:14
Architectural components
Duration:00:01:32
About Ollama
Duration:00:01:08
Alternatives to Ollama
Duration:00:04:12
Generate a PDF document with Ollama
Duration:00:04:58
RAG implementation
Duration:00:04:47
Load and chunk the PDF document
Duration:00:01:53
Alternative chunking strategies in LangChain
Duration:00:03:18
Creating embeddings with metadata
Duration:00:02:37
Using them in code
Duration:00:00:48
Hybrid search with semantic and keyword
Duration:00:02:32
Other retrievers you can use
Duration:00:02:48
Conversation memory buffer
Duration:00:01:27
LLM configuration natural language generation
Duration:00:01:05
ReAct prompt template
Duration:00:01:33
Building the conversational QA chain
Duration:00:01:30
User chat loop
Duration:00:02:50
Challenges in RAG
Duration:00:04:43
4. Implementing Unimodal API-based GenAI Systems
Duration:00:00:06
Getting started with OpenAI APIs and models
Duration:00:00:59
OpenAI as a company
Duration:00:00:48
Overview of the OpenAI API
Duration:00:00:54
Core API endpoints
Duration:00:00:35
Major OpenAI models
Duration:00:01:52
Accessing OpenAI models
Duration:00:00:55
Choosing the right model
Duration:00:00:37
Best practices for beginners
Duration:00:01:22
From OpenAI to agentic AI
Duration:00:01:20
OpenAI’s agentic API ecosystem
Duration:00:00:40
Responses API
Duration:00:00:55
Agents SDK
Duration:00:01:21
Operator
Duration:00:00:45
Codex
Duration:00:00:44
Assistants API
Duration:00:00:34
Multi-document query
Duration:00:04:24
Implementing modular RAG with OpenAI
Duration:00:00:46
Main controller
Duration:00:01:16
Configuration
Duration:00:01:14
Embedding initialization
Duration:00:01:03
Vector store setup
Duration:00:01:51
Metadata tagging
Duration:00:00:55
Document loading and chunking
Duration:00:01:51
Hybrid retriever
Duration:00:00:21
Enforce metadata-based filtering during retrieval
Duration:00:01:43
Language model
Duration:00:01:04
Prompt template
Duration:00:01:17
RAG chain assembly
Duration:00:02:15
Conversational memory
Duration:00:00:58
Dependencies
Duration:00:01:29
To do
Duration:00:01:17
5. Implementing Agentic GenAI Systems with Human-in-the-loop
Duration:00:00:06
Architecting agentic GenAI systems
Duration:00:01:20
Parallel pattern
Duration:00:01:27
Sequential pattern
Duration:00:01:07
Loop pattern
Duration:00:01:09
Router pattern
Duration:00:01:06
Aggregator pattern
Duration:00:01:02
Network pattern
Duration:00:01:06
Hierarchical pattern
Duration:00:01:01
Human-in-the-loop pattern
Duration:00:00:58
Shared tools pattern
Duration:00:00:52
Database with tools pattern
Duration:00:00:54
Memory transformation using tools
Duration:00:01:03
Planner-executor pattern
Duration:00:01:01
Critic or validator pattern
Duration:00:00:59
Negotiator pattern
Duration:00:01:11
Multimodal agent pattern
Duration:00:00:59
Voting or consensus pattern
Duration:00:00:55
Supervisor-subordinate pattern
Duration:00:01:20
Watchdog or recovery pattern
Duration:00:01:00
Temporal planner pattern
Duration:00:03:32
Human-in-the-loop
Duration:00:04:21
End-to-end human-in-the-loop RAG workflow
Duration:00:01:23
From HITL to multi-agent human-in-the-loop RAG
Duration:00:05:47
Agentic AI vs. AI agents
Duration:00:03:48
6. Two and Multi-stage GenAI Systems
Duration:00:00:05
Concepts of interactions in dense retrievals
Duration:00:00:39
No interaction
Duration:00:00:56
Full interaction
Duration:00:00:54
Late interaction
Duration:00:01:59
Multi-vector representations
Duration:00:03:50
Differentiation from late interaction architectures
Duration:00:02:26
Role of interaction models in two-stage RAG systems
Duration:00:00:42
Interaction in the retrieval phase
Duration:00:00:50
Reranking with various interaction models
Duration:00:03:21
Integration into two-stage RAG architectures
Duration:00:01:09
Two-stage RAG architecture
Duration:00:00:35
Stage one dense retrievals
Duration:00:00:54
Stage-two, reranking for semantic precision
Duration:00:00:56
The strategic role of two-stage design
Duration:00:01:14
Two-stage RAG vs. late interaction
Duration:00:00:32
Capabilities of ColBERT and ColPali
Duration:00:01:01
Use of two-stage RAG
Duration:00:01:11
Multi-stage RAG
Duration:00:00:33
Beyond two-stage systems
Duration:00:00:43
Components of multi-stage RAG
Duration:00:01:36
Benefits of multi-stage RAG
Duration:00:00:54
Types of multi-stage RAG
Duration:00:05:02
Grading mechanisms
Duration:00:03:33
Challenges and considerations
Duration:00:00:47
Token utilization in multi-stage RAG systems
Duration:00:01:58
Grading types
Duration:00:11:57
Implementation of multi-stage RAG workflow with routing
Duration:00:02:17
7. Building a Bidirectional Multimodal Retrieval System
Duration:00:00:06
Integration and design implications
Duration:00:01:07
Understanding a multimodal retrieval system
Duration:00:01:44
Technical architecture
Duration:00:07:27
Applications and implications
Duration:00:00:41
Code implementation and explanation
Duration:00:00:46
Requirement
Duration:00:05:57
Frontend
Duration:00:08:25
Data directory
Duration:00:00:48
The retrieval system
Duration:00:01:14
Loaders
Duration:00:04:29
Embedding utils
Duration:00:02:05