Bin Zhang Data, AI, and engineering notes
Home Blog Archive

Tags

  • #agents
  • #autogen
  • #cloud
  • #containers
  • #data platform
  • #dataframe
  • #deep learning
  • #etl
  • #functional programming
  • #langchain
  • #langgraph
  • #leadership
  • #linear algebra
  • #llm apps
  • #llm fundamentals
  • #llmops
  • #mcp
  • #mlops
  • #model training
  • #optimization
  • #orchestration
  • #python
  • #rag
  • #spark
  • #sql
  • #streaming
  • #tool use
  • #web development

PySpark Dataframe Transformation

November 15, 2024
DATA ENGINEERING
sparkpythondataframe

```python linenums="1" spark = ( SparkSession.builder.master("local[]").appName("test").getOrCreate() ) d = [ Event(1, "abc"), Event(2, "ddd"), ]

Databricks Wheel Job

November 1, 2024
DATA ENGINEERING
data platformsparkpython

My previous spark project is scala based and I use IDEA to compile and test conveniently.:smile::smile::smile: Databricks Job nice UI save your time...

Spark Dataframe window function

March 1, 2020
DATA ENGINEERING
sparkdataframe

scala ref create dataframe

Spark SQL

February 21, 2020
DATA ENGINEERING
sparksqloptimization

```txt master MASTERURL --> 运行模式 例:spark://host:port, mesos://host:port, yarn, or local.

Spark Optimization

February 21, 2020
DATA ENGINEERING
sparkoptimization

PROCESSLOCAL data is in the same JVM as the running code. This is the best locality possible NODELOCAL data is on the same node. Examples might be in...

Spark Structured Streaming

February 8, 2020
DATA ENGINEERING
sparkstreaming

Recently reading a blog Structured Streaming in PySpark It's implemented in Databricks platform. Then I try to implement in my local Spark. Some...

© 2026 Bin Zhang. All rights reserved.