Using SQL in Data Lake Journey Begins!
Effectively managing Spark sessions in Jupyter Notebook is crucial for optimal resource utilization. Here are the best practices for handling Spark sessions:
Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
This happens because multiple Spark sessions remain active when switching between different notebooks.
To prevent multiple Spark UI instances and avoid port conflicts, always stop an existing Spark session before creating a new one:
# 🚨 Check if a Spark session exists and stop it before creating a new one
if 'spark' in locals() and spark is not None:
spark.stop()
# ✅ Create a new SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder \\
.master("local[*]") \\
.appName("YourAppName") \\
.getOrCreate()
When switching between different notebooks, ensure that Spark sessions are properly stopped.
At the end of each notebook, explicitly close the Spark session to release ports and resources:
# Optional: Add a closure method at the end of your notebook
spark.stop()
📌 This step is crucial! If Spark is not stopped, the next notebook will start a new session without releasing the previous ports.
✅ Prevents multiple SparkUI instances from running simultaneously
✅ Releases previously occupied ports, avoiding conflicts like:
Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Service 'SparkUI' could not bind on port 4042.
✅ Ensures clean resource management for Spark
✅ Avoids conflicts when working across multiple Jupyter notebooks