Databricks Jobs
Recently I successfully deploy my python wheel to Databricks Cluster. Here are some tips if you plan to deploy pyspark
.
pyspark
project
My previous spark project is scala
based and I use IDEA
to compile
and test
conveniently.



Databricks
Job nice UI save your time to create JAR
job.
This is official guide: Databricks Wheel Job
What I did:
-
Initialize a python project
| # create python virtual environment
python -m venv pyspark_venv
# active your venv
source pyspark_venv/bin/activate
# check your current python
which python
# install python lib
pip install uv ruff pyspark pytest wheel
## if pip failed at proxy error
## adding your proxy
## --proxy http://proxy:port
# create your project
uv init --package <your package name>
|
After uv
command complete, a nice python project is created.
| pyspark-app
├── README.md
├── pyproject.toml
└── src
└── pyspark_app
└── __init__.py
|
-
pyspark entry point
- add one file
__main__.py
in pyspark_app - modify
[project.scripts]
in pyproject.toml
and this is entry point
of Databricks job
Now the project is
| pyspark-app
├── README.md
├── pyproject.toml
└── src
└── pyspark_app
├── __init__.py
└── __main__.py
|
pytest
Please check your pytest
installed. Let create a new package test
| pyspark-app
├── README.md
├── pyproject.toml
└── src
└── pyspark_app
├── __init__.py
├── __main__.py
└── test
├── __init__.py
├── conftest.py
└── test_spark.py
|
test_spark |
---|
| def test_spark(init_spark):
spark = init_spark
df = spark.range(10)
df.show()
""" output
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/01 20:59:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PASSED [100%]+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
+---+
"""
|
Now you can work on your spark application with test
wheel file
Final step is building wheel
file
| # 1. change your work directory to pyproject.toml
# 2. run below command
python -m build --wheel
# project is now changing to
pyspark-app
├── README.md
├── build
│ ├── bdist.macosx-12.0-x86_64
│ └── lib
│ └── pyspark_app
│ ├── __init__.py
│ ├── __main__.py
│ └── test
│ ├── __init__.py
│ ├── conftest.py
│ └── test_spark.py
├── dist
│ └── pyspark_app-0.1.0-py3-none-any.whl
├── pyproject.toml
└── src
├── pyspark_app
│ ├── __init__.py
│ ├── __main__.py
│ └── test
│ ├── __init__.py
│ ├── conftest.py
│ └── test_spark.py
└── pyspark_app.egg-info
├── PKG-INFO
├── SOURCES.txt
├── dependency_links.txt
├── entry_points.txt
└── top_level.txt
|
Your wheel file is at line
20
Go to view all at Project template