最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Spark read csv with semicolon inside cell - Stack Overflow

programmeradmin4浏览0评论

My spark application read csv file with following options:

   sparkSession.read
  .format("com.databricks.spark.csv")
  .option("quote", "\ufffd")
  .option("delimiter", ";") 
  .load("/sample.csv")

I use "\ufffd" as a "quote"-value to avoid problem with data values, when quotes appear to be a part of value.

For example, without ("quote", "\ufffd")-option, this sample:

 value1;"value2;value3

will be read as:

+--------+---------------+ 
| value1 | value2;value3 |

instead of:

 +--------+--------+--------+
 | value1 | value2 | value3 |

But, when the semicolon mark is inside value (separated by quotes), I faced with new problem: the cell will be devided on to values

So, this sample:

value1;"val;ue";value3

will be read as:

+--------+-----+------+--------+
| value1 | "val| ue2" | value3 |

 

Is there any way in Spark APi to read csv, both with quotes, and semicolons inside cell-values, when semicolon is set as s delimiter?

发布评论

评论列表(0)

  1. 暂无评论