你的位置：首页>programmer>Spark read csv with semicolon inside cell - Stack Overflow

Spark read csv with semicolon inside cell - Stack Overflow

My spark application read csv file with following options:

   sparkSession.read
  .format("com.databricks.spark.csv")
  .option("quote", "\ufffd")
  .option("delimiter", ";") 
  .load("/sample.csv")

I use "\ufffd" as a "quote"-value to avoid problem with data values, when quotes appear to be a part of value.

For example, without ("quote", "\ufffd")-option, this sample:

 value1;"value2;value3

will be read as:

+--------+---------------+ 
| value1 | value2;value3 |

instead of:

 +--------+--------+--------+
 | value1 | value2 | value3 |

But, when the semicolon mark is inside value (separated by quotes), I faced with new problem: the cell will be devided on to values

So, this sample:

value1;"val;ue";value3

will be read as:

+--------+-----+------+--------+
| value1 | "val| ue2" | value3 |

Is there any way in Spark APi to read csv, both with quotes, and semicolons inside cell-values, when semicolon is set as s delimiter?

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始