I have a CSV file containing some line breaks in values. For example:
"Chiffre","Nom","Descriptif court","Tarifs en clair","Période en clair","Adresse 1","CP","Tel","Site","Facebook","Pictos","@Saveurs","@Famille plus","@Img","Accessible en fauteuil roulant en autonomie","Handicapes","Terrasse","Wifi","Chèque Vacances","Titre Restaurant"
6,"Le Chalet d'en Ô","Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.","Menu adulte : de 20 à 30 €
Menu enfant : 10 €.
Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.Menu adulte : de 20 à 30 €
Menu enfant : 10 €.
Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.","Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.","Le Cropt - Route de Serraval","74230 Les Clefs","+33 4 50 02 09 00","www.lechaletdeno",,"A R J X x",,,,,"A","R","J","X","x"
7,"La Cabane - Pisciculture de Montremont","Bar - restaurant au bord de la rivière dans un cadre champêtre avec sa spécialité la truite.","Menu adulte : de 26 à 35 €.","Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.","Pisciculture de Montremont - 100 impasse des Pesetz","74230 Thônes","+33 4 50 02 00 85","pisciculture-montremont.fr",,"A R X x",,,,,"A","R",,"X","x"
Just need to replace any line break with a space (or anything).
Tried a lot of existing solutions, but using \\n
as search term doesn't seem to work, among other issues.
Any idea? Thanks.
Sublime Text would suffice but if another tool is easier, no problem.
Edit: \n
works fine if I have only this as the search term. But I need to find them only between "
. The best I got so far is matching all text between "
with: (?<=")[^"]*
.
I have a CSV file containing some line breaks in values. For example:
"Chiffre","Nom","Descriptif court","Tarifs en clair","Période en clair","Adresse 1","CP","Tel","Site","Facebook","Pictos","@Saveurs","@Famille plus","@Img","Accessible en fauteuil roulant en autonomie","Handicapes","Terrasse","Wifi","Chèque Vacances","Titre Restaurant"
6,"Le Chalet d'en Ô","Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.Cuisine traditionnelle (foie gras, magret myrtilles, croustillant chocolat)
Spécialités savoyardes (fondue, tartiflette.. )
Garanti sans burgers.
Tout est fait maison.","Menu adulte : de 20 à 30 €
Menu enfant : 10 €.
Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.Menu adulte : de 20 à 30 €
Menu enfant : 10 €.
Suggestion du jour le midi en semaine : entrée,plat, dessert : 20€.","Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.Midi et soir du jeudi au samedi + midi le dimanche et jours fériés.
Juillet et août, midi et soir du mardi au samedi et midi uniquement dimanche et jours fériés.","Le Cropt - Route de Serraval","74230 Les Clefs","+33 4 50 02 09 00","www.lechaletdeno.com",,"A R J X x",,,,,"A","R","J","X","x"
7,"La Cabane - Pisciculture de Montremont","Bar - restaurant au bord de la rivière dans un cadre champêtre avec sa spécialité la truite.","Menu adulte : de 26 à 35 €.","Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.Juin et septembre : du mercredi au dimanche.
Juillet et août : tous les jours à midi + mercredi à samedi le soir.","Pisciculture de Montremont - 100 impasse des Pesetz","74230 Thônes","+33 4 50 02 00 85","pisciculture-montremont.fr",,"A R X x",,,,,"A","R",,"X","x"
Just need to replace any line break with a space (or anything).
Tried a lot of existing solutions, but using \\n
as search term doesn't seem to work, among other issues.
Any idea? Thanks.
Sublime Text would suffice but if another tool is easier, no problem.
Edit: \n
works fine if I have only this as the search term. But I need to find them only between "
. The best I got so far is matching all text between "
with: (?<=")[^"]*
.
4 Answers
Reset to default 11You cannot do that with a text editor in a safe way because the starting and ending delimiters are identical, and any lookaround-based solution, or a \G
based solution won't work well.
Use some programming language that supports a callback method/function as the replacement argument in a regex replace method/function, match any double quoted substring with
"[^"]*(?:""[^"]*)*"
See the regex demo. A simplified version if you do not need to care about literal double quotes is "[^"]+"
.
Details
"
- a double quote[^"]*
- 0+ chars other than double quotes(?:
- a grouping construct that is repeated 0+ times""
- 2 double quotes[^"]*
- 0+ chars other than double quotes
)*
-"
- a double quote.
This regex can be used as follows: read in a file and use the following solutions:
- python:
re.sub(r'"[^"]*(?:""[^"]*)*"', lambda m: m.group(0).replace("\n", ""), s)
- javascript:
s = s.replace(/"[^"]*(?:""[^"]*)*"/g, function(m) { return m.replace(/\n/g, ''); })
- php:
$s = preg_replace_callback('~"[^"]*(?:""[^"]*)*"~', function($m) { return str_replace("\n", "", $m[0]); }, $s)
- c#:
s = Regex.Replace(s, "\"[^\"]*(?:\"\"[^\"]*)*\"", m => m.Value.Replace("\n", ""))
If you have \r
and \n
to remove, as a second step, you may use .replace(/[\r\n]+/g, '')
in JS, preg_replace('~\R+~', '', $m[0])
in PHP, m.Value.Replace("\r", "").Replace("\n","")
in C# and m.group(0).replace("\n", "").replace("\n", "")
in Python.
In C#, the full solution will look like
using System.IO;
...
var file = "path_to_file";
var path_to_save = "path_to_save";
var contents = string.Empty;
using (var sr = new StreamReader(file, true)) // true for a Unicode encoding
{
contents = sr.ReadToEnd();
}
contents = Regex.Replace(contents, "\"[^\"]*(?:\"\"[^\"]*)*\"",
m => m.Value.Replace("\n", "").Replace("\r", ""));
using (var sw = new StreamWriter(path_to_save, false, Encoding.UTF8))
{
sw.Write(contents);
sw.Close();
}
Use Notepad++ regex Find-and-Replace:
Find what:
(,"[^"]*?)[\r\n]+
Replace with:
$1
(There is a single space after $1)
Repeatedly click "Replace All" until no more matches are found.
For anyone arriving here searching for a system shell (bash, etc) answer. You can use the sed
answer from Alister, from this unix.com link, or perl.
sed:
sed -n 'H;g;/^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/d; s/^\n//; y/\n/ /; p; s/.*//; h' current.csv > new.csv
perl:
perl -0777 -pe '
s{".*?"}{
($r = $&) =~ s/\n/ /g;
$r
}gsem' < current.csv > new.csv
sed -n 'H;g;/^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/d; s/^\n//; s/\n//; p; s/.*//; h' current.csv > new.csv
\n
works fine for me. – Martin Schmelzer Commented Dec 13, 2017 at 15:54\n
was used and not\r
or something? Try[\r\n]
– ctwheels Commented Dec 13, 2017 at 15:54\n
– The fourth bird Commented Dec 13, 2017 at 16:06\n
works fine if alone. The regex option is checked. No luck with\r\n
neither. – Nico Prat Commented Dec 13, 2017 at 16:13(?<=")\n(?=")
for a single newline or(?<=")\n\n(?=")
for two newlines. – The fourth bird Commented Dec 13, 2017 at 16:25