SZ
Right now it is in beta stage and we need your feedback before we create a pull request to the official Apache Spark repository.
Why Kotlin Spark API? While you can use Kotlin with the existing Apache Spark Java API, Kotlin Spark API significantly improves the developer experience. For instance, this API allows you to use such Kotlin features as data classes and lambda expressions.
On top of that Kotlin Spark API adds some helpful extension functions.
Use
withCached
to perform arbitrary transformations on a Dataset without it being recalculated, and don’t worry about your Dataset unpersisting at the end.Kotlin Spark API also allows you to have unnamed tuples that you can call with
c()
function that takes a variable number of arguments. You can add these to one another like in Python.Check out the Quick Start Guide to quickly set up all the needed dependencies using Maven or Gradle: https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md
Check out some code examples to get an idea of what the API looks like: https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
Try it and share your feedback with us either in #kotlin-spark, or via GitHub issues: https://github.com/JetBrains/kotlin-spark-api/issues.