ПФ
Size: a a a
ПФ
AS
ПФ
ПФ
АЖ
ПФ
AS
ПФ
ПФ
ПФ
val df = spark.read.orc("/home/finkel/Downloads/ml-latest/movies").as[Movie]
.filter($"title" rlike (""""?.*\(\d{4}\)\s*"?"""))
.withColumn("year", regexp_extract($"title", """\((\d{4})\)\s*"?""", 1).cast(IntegerType))
.withColumn("title", regexp_replace($"title", """\(\d{4}\)\s*"?""", ""))
.withColumn("genres", split($"genres", "\\|"))
.as[MovieWithGenresAndYear]
.groupBy($"year")
.agg($"year", count($"title"))
.show(300, false)
ME
GP
GP
GP
ПФ
GP
GP
GP
GP
GP