Skip to content

marceloneppel/apache-beam-golang-udf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Beam Golang UDF

Run UDFs (User Defined Functions) on Apache Beam Golang SDK.

Go Modules activation (if necessary):

export GO111MODULE=on

CSV parse example:

# Direct
go run examples/parse/csv/csv.go

# Direct with internet files
go run examples/parse/csv/csv.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

JSON parse example:

# Direct
go run examples/parse/json/json.go

# Direct with internet files
go run examples/parse/json/json.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

XML parse example:

# Direct
go run examples/parse/xml/xml.go

# Direct with internet files
go run examples/parse/xml/xml.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

On examples using GCS files, please upload the example files to one of your buckets first.

About

Run UDFs (User Defined Functions) on Apache Beam Golang SDK.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages