-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Describe the bug
I put on the config of spark
spark.jars=hdfs:///tmp/spark-2.4-spline-agent-bundle_2.11-2.2.1.jar
spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
spark.spline.mode=ENABLED
spark.spline.lineageDispatcher=hdfs
spark.spline.lineageDispatcher.hdfs.outputDir=hdfs:///tmp/spline/lineage/
spark.spline.lineageDispatcher.hdfs.fileNamePrefix=lineage_
spark.spline.lineageDispatcher.hdfs.fileBufferSize=4096
spark.spline.lineageDispatcher.hdfs.filePermissions=777
spark.driver.memory=4g
I run this code
from pyspark.sql import SparkSession
spark = SparkSession.builder
.appName("Write DataFrame to HDFS as CSV")
.getOrCreate()
Create a sample DataFrame
data = [
(1, "Alice", 28),
(2, "Bob", 24),
(3, "Cathy", 29)
]
columns = ["Id", "Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()
output_path = "hdfs:///tmp/sample_data"
df.coalesce(1).write
.option("header", True)
.mode("overwrite")
.csv(output_path)
print("DataFrame written to HDFS at {}".format(output_path))
spark.stop()
the log show that all ok
But when i go to
[[email protected] Scripts]# hdfs dfs -ls /tmp/spline/lineage/
I see no data
attached the log
Versions
Please provide versions of: Spline, Spark and Scala that were in use when the bug happened.
Spark- 2.4.8.7.2.18.0-641
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status