Chapter 03Beginner

The Spark Execution Model

Lazy Evaluation, Transformations vs. Actions, and Lineage

When you write df.filter(...).groupBy(...), nothing happens. Spark does not read a single byte of data. It just takes notes.

This is lazy evaluation. Spark builds a plan, a directed acyclic graph (DAG) of operations, and waits. Only when you call an Action like .count() or .show() does Spark actually execute.

Why lazy? Because Spark can optimize the entire plan before running it. It might reorder filters, push predicates down, or skip entire files. You get these optimizations for free.

Try It
Key Concepts
01
Transformations
filter(), select(), groupBy(), join(): these build the plan. They are lazy and return a new DataFrame.
02
Actions
count(), show(), collect(), write(): these trigger execution. Spark runs the full DAG only now.
03
Lineage Graph
Spark remembers how every DataFrame was derived. If a partition is lost, it can recompute it from the lineage.
Transformations Plan, Actions Execute
Core Concept

Nothing runs until you call an Action. Every transformation before that is just a recipe, not cooking.