apache-spark
how to aggregate rows based on a condition in pyspark?
I am trying to aggregate some rows in my pyspark dataframe based on a condition. Here is my dataframe:[详细]
2022-12-07 21:37 分类:问答spark: How reducing executor cores solve memory issue?
When I was searching for a memory related issue in spark, I came across this article, which is suggesting to redu开发者_运维知识库ce the number of cores per executor, but in the same article it\'s men[详细]
2022-12-07 21:06 分类:问答Write >1 files (limited by size) from a spark partition
I am fetching an RDBMS table using JDBC with some 10-20 partitions using ROW_NUM. Then from each of these partitions I want to process/format the data, and write one or more files out to file storage[详细]
2022-12-07 20:04 分类:问答sometime Not enough replicas available for query at consistency ONE (1 required but only 0 alive) error
when insert date into cassandra, I use spark structured stream kafka-cassandra. data insert no problem. but when i load data. sometime occurred error, SQL Error: com.datastax.driver.core.exceptions.Un[详细]
2022-12-07 18:26 分类:问答why mappartition does not see my val - SCALA/SPARK?
I define val like this : val config = Config(args) val product_type = config.product_type thenI send product_type as "AA"[详细]
2022-12-07 17:10 分类:问答