Skip to content

Spline doest report lineage to hdfs #859

@zacayd

Description

@zacayd

Describe the bug

I put on the config of spark

spark.jars=hdfs:///tmp/spark-2.4-spline-agent-bundle_2.11-2.2.1.jar
spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
spark.spline.mode=ENABLED
spark.spline.lineageDispatcher=hdfs
spark.spline.lineageDispatcher.hdfs.outputDir=hdfs:///tmp/spline/lineage/
spark.spline.lineageDispatcher.hdfs.fileNamePrefix=lineage_
spark.spline.lineageDispatcher.hdfs.fileBufferSize=4096
spark.spline.lineageDispatcher.hdfs.filePermissions=777
spark.driver.memory=4g

I run this code
from pyspark.sql import SparkSession
spark = SparkSession.builder
.appName("Write DataFrame to HDFS as CSV")
.getOrCreate()

Create a sample DataFrame

data = [
(1, "Alice", 28),
(2, "Bob", 24),
(3, "Cathy", 29)
]
columns = ["Id", "Name", "Age"]
df = spark.createDataFrame(data, columns)
df.show()
output_path = "hdfs:///tmp/sample_data"
df.coalesce(1).write
.option("header", True)
.mode("overwrite")
.csv(output_path)
print("DataFrame written to HDFS at {}".format(output_path))
spark.stop()

the log show that all ok

But when i go to
[[email protected] Scripts]# hdfs dfs -ls /tmp/spline/lineage/
I see no data

attached the log

Versions

Please provide versions of: Spline, Spark and Scala that were in use when the bug happened.
Spark- 2.4.8.7.2.18.0-641
/_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)

Spark 2 4 Spline Agent Bundle » 2.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions