Skip to content

SPARKC-686 scala 2.13 support #1361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 28, 2023
Merged

SPARKC-686 scala 2.13 support #1361

merged 11 commits into from
Jul 28, 2023

Conversation

SamTheisens
Copy link
Contributor

@SamTheisens SamTheisens commented Jul 11, 2023

Description

How did the Spark Cassandra Connector Work or Not Work Before this Patch

Spark Cassandra Connector did not work with Scala 2.13 before this patch.

General Design of the patch

This is another attempt inspired by #1349. The approach has been as follows:

  • Applied automatated scala collection migration via scalafix
  • Conditionally add dependencies on org.scala-lang.modules.scala-collection-compat and org.scala-lang.modules.scala-parallel-collections for scala 2.13
  • Scala version specific implementations of SparkILoop, ParIterable, GenericJavaRowReaderFactory and CanBuildFrom.

How Has This Been Tested?

All automated tests have passed in my fork, but the (2.12.11, 3.11.10) build seems to time out intermittently at cluster startup:

java.lang.RuntimeException: The command [ccm, create, ccm_1, -i, 127.12.0., -v, 5.1.24, --dse, --config-dir=/tmp/ccm6059075771339504008] was killed after 10 minutes

Checklist:

  • I have a ticket in the OSS JIRA
  • I have performed a self-review of my own code
  • Locally all tests pass (make sure tests fail without your patch)

while keeping compatibility with scala 2.12

 - Applied automatated scala collection migration via [scalafix]
   (https://github.com/scala/scala-collection-compat)
 - Conditionally add dependencies on
   `org.scala-lang.modules.scala-collection-compat` and
   `org.scala-lang.modules.scala-parallel-collections` for scala 2.13
 - Scala version specific implementations of `SparkILoop`, `ParIterable`,
   `GenericJavaRowReaderFactory` and `CanBuildFrom`.

branch: feature/SPARKC-686-scala-213-support
with canonical way to map values

branch: feature/SPARKC-686-scala-213-support
@SamTheisens SamTheisens changed the title Feature/sparkc 686 scala 213 support SPARKC-686 scala 213 support Jul 11, 2023
@SamTheisens SamTheisens changed the title SPARKC-686 scala 213 support SPARKC-686 scala 2.13 support Jul 11, 2023
because Stream is deprecated and results in a stack overflow on scala
2.13

branch: feature/SPARKC-686-scala-213-support
`java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer
cannot be cast to scala.collection.immutable.Seq`

branch: feature/SPARKC-686-scala-213-support
branch: feature/SPARKC-686-scala-213-support
so we don't need to trawl through the (long) log output to find out
which test failed.
Annotate only, which doesn't require check permission.

branch: feature/SPARKC-686-scala-213-support
@SamTheisens
Copy link
Contributor Author

@jtgrabowski The intermittence of the failing tests doesn't seem to depend on the particular version of scala or cassandra used. Rather, I seem to have more success in the early hours in my timezone (Indonesia).
Re-running failed tests only tends to get all the lights green.

Could it be that we run into timing problems due to varying amounts of resources available to github actions at different times of the day?
Screenshot 2023-07-14 at 15 44 27

@jtgrabowski
Copy link
Contributor

I don't know what is causing this, the errors seam unrelated to the PR.
I apologize, but I won't be able to take a look until 24th of July. Let's revisit then.

@SamTheisens
Copy link
Contributor Author

I don't know what is causing this, the errors seam unrelated to the PR. I apologize, but I won't be able to take a look until 24th of July. Let's revisit then.

No worries! Anything I could do in the mean time to help preparing a new release?

Copy link
Contributor

@jtgrabowski jtgrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Left minor comments.

@@ -12,7 +11,9 @@ import com.datastax.spark.connector.cql.CassandraConnector
import com.datastax.spark.connector.embedded.SparkTemplate._
import com.datastax.spark.connector.rdd.partitioner.EndpointPartition
import com.datastax.spark.connector.writer.AsyncExecutor
import spire.ClassTag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use scala.reflect.ClassTag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -771,7 +771,7 @@ class TableWriterSpec extends SparkCassandraITFlatSpecBase with DefaultCluster {
verifyKeyValueTable("key_value")
}

it should "be able to append and prepend elements to a C* list" in {
it should "be able to.append and.prepend elements to a C* list" in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove dots from test name here and others below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -175,7 +175,7 @@ class GettableDataToMappedTypeConverter[T : TypeTag : ColumnMapper](
for ((s, _) <- columnMap.setters)
yield (s, ReflectionUtil.methodParamTypes(targetType, s).head)
val setterColumnTypes: Map[String, ColumnType[_]] =
columnMap.setters.mapValues(columnType)
columnMap.setters.map{case (k, v) => (k, columnType(v))}.toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why did you choose .map().toMap over .view.mapValues...?

Copy link
Contributor Author

@SamTheisens SamTheisens Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC I was struggling to find a syntax that works in both 2.12 and 2.13. It looks like mapValues was actually fine as long as the return type is converted back to a map.
[SPARKC-686] Replace closure with idiomatic mapValues

branch: feature/SPARKC-686-scala-213-support
the .toMap is necessary for scala 2.13 as the function returns a
`scala.collection.MapView` instead of Map

branch: feature/SPARKC-686-scala-213-support
Copy link
Contributor

@jtgrabowski jtgrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for this awesome contribution.
Please sing CLA (https://cla.datastax.com/) if you haven't already.

@SamTheisens
Copy link
Contributor Author

LGTM! Thank you for this awesome contribution. Please sing CLA (https://cla.datastax.com/) if you haven't already.

Yes, I've signed.

@jtgrabowski jtgrabowski merged commit 5a25f7f into apache:master Jul 28, 2023
@jtgrabowski
Copy link
Contributor

Thank you @SamTheisens ! A new scc version should be released next week.

@SamTheisens
Copy link
Contributor Author

Thank you @SamTheisens ! A new scc version should be released next week.

@jtgrabowski anything I could help out with towards creating a release?

@jtgrabowski
Copy link
Contributor

@SamTheisens the release is now done. The artifacts should appear in the public repository shortly. Thanks again!

@SamTheisens
Copy link
Contributor Author

@SamTheisens the release is now done. The artifacts should appear in the public repository shortly. Thanks again!

Awesome! Thanks a lot!

@jesperancinha
Copy link

I tests this only today. I was following the other branch. Everything works now. Thank you!

eappere added a commit to eappere/spark-cassandra-connector that referenced this pull request Dec 5, 2024
eappere added a commit to criteo-forks/spark-cassandra-connector that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants