My spark application read csv file with following options:
sparkSession.read
.format("com.databricks.spark.csv")
.option("quote", "\ufffd")
.option("delimiter", ";")
.load("/sample.csv")
I use "\ufffd" as a "quote"-value to avoid problem with data values, when quotes appear to be a part of value.
For example, without ("quote", "\ufffd")-option, this sample:
value1;"value2;value3
will be read as:
+--------+---------------+
| value1 | value2;value3 |
instead of:
+--------+--------+--------+
| value1 | value2 | value3 |
But, when the semicolon mark is inside value (separated by quotes), I faced with new problem: the cell will be devided on to values
So, this sample:
value1;"val;ue";value3
will be read as:
+--------+-----+------+--------+
| value1 | "val| ue2" | value3 |
Is there any way in Spark APi to read csv, both with quotes, and semicolons inside cell-values, when semicolon is set as s delimiter?