Chapter 02Beginner
Apache Spark Architecture Overview
Driver, Executors, Cluster Manager, and Cores
Every Spark job involves three roles. Think of it like a restaurant: a head chef (Driver), a kitchen manager (Cluster Manager), and the line cooks (Executors).
The Driver is your program. It builds the plan, decides what to do, and coordinates everything. It never touches the actual data.
Executors are where real work happens. They receive tasks from the Driver, process their partition of data, and report back.
Spark Architecture
01
Driver
Plans the job, builds the DAG, schedules tasks. Runs on the master node.
02
Cluster Manager
Allocates resources (CPU, RAM) across the cluster. Can be YARN, Kubernetes, or Spark Standalone.
03
Executor
Runs tasks assigned by the Driver. Each executor handles one or more partitions.
Key Concepts
01
Driver Program
Your SparkSession lives here. All transformations are planned here but executed elsewhere.
02
Executor Cores
Each executor has multiple cores. Each core can run one task at a time in parallel.
03
Cluster Manager
Decoupled from Spark itself. You can swap YARN for Kubernetes without changing your Spark code.
Driver Plans, Executors Run
Core Concept
The Driver plans. The Executors act. The Cluster Manager makes sure everyone has enough resources.