I have data * in the following format in a text file.
x1 y1 z1 x2 y2 z2 x3 y3 z3 data1x data1y data1z data2x data2y data2z data3x data3y data3z
I'm trying to reformat the file such the output looks like this:
x1 y1 z1 data1x data1y data1z
x2 y2 z2 data2x data2y data2z
x3 y3 z3 data3x data3y data3z
I'm sure that there's a smart way to do the same using for example pandas. I tried with pivot and pivot_table, but the output wasn't quite right. Could you please help me?
- It's node coordinates and the corresponding vector field values (all floats), and in reality each row consists of 10 nodes, and there are thousands of rows in the file
This
all_data = []
with open(old_file, "r") as file1:
skip_first_row = file1.readline()
for line in file1:
line_list = line.split()
all_data.append(line_list[0:3]+line_list[30:33])
all_data.append(line_list[3:6]+line_list[33:36])
all_data.append(line_list[6:9]+line_list[36:39])
with open(new_file, "w") as file2:
write_headline = file2.write("%s\n" % headline)
for data in all_data:
file2.write(' '.join(data) + '\n')
does the job, but I'm sure there a smarter way!
I have data * in the following format in a text file.
x1 y1 z1 x2 y2 z2 x3 y3 z3 data1x data1y data1z data2x data2y data2z data3x data3y data3z
I'm trying to reformat the file such the output looks like this:
x1 y1 z1 data1x data1y data1z
x2 y2 z2 data2x data2y data2z
x3 y3 z3 data3x data3y data3z
I'm sure that there's a smart way to do the same using for example pandas. I tried with pivot and pivot_table, but the output wasn't quite right. Could you please help me?
- It's node coordinates and the corresponding vector field values (all floats), and in reality each row consists of 10 nodes, and there are thousands of rows in the file
This
all_data = []
with open(old_file, "r") as file1:
skip_first_row = file1.readline()
for line in file1:
line_list = line.split()
all_data.append(line_list[0:3]+line_list[30:33])
all_data.append(line_list[3:6]+line_list[33:36])
all_data.append(line_list[6:9]+line_list[36:39])
with open(new_file, "w") as file2:
write_headline = file2.write("%s\n" % headline)
for data in all_data:
file2.write(' '.join(data) + '\n')
does the job, but I'm sure there a smarter way!
Share Improve this question edited Mar 16 at 19:50 globglogabgalab 5883 silver badges19 bronze badges asked Mar 14 at 16:56 user29975383user29975383 111 silver badge1 bronze badge2 Answers
Reset to default 1Another possible solution, based on pandas
and numpy
:
txt = "x1 y1 z1 x2 y2 z2 x3 y3 z3 data1x data1y data1z data2x data2y data2z data3x data3y data3z"
pd.DataFrame(np.hstack(
pd.read_csv(StringIO(txt), sep=r'\s+', header=None).values
.reshape(-1, 3, 3))) # replace StringIO(txt) with 'your_file.csv'
It first reads the space-separated data from the file into a pandas
datadrame using pd.read_csv
with StringIO(txt)
as a placeholder for the file input. It then extracts the underlying numpy
array using .values
, reshapes it into a 3D array of shape (-1, 3, 3)
. Finally, np.hstack
is used to horizontally concatenate these sub-arrays.
Output:
0 1 2 3 4 5
0 x1 y1 z1 data1x data1y data1z
1 x2 y2 z2 data2x data2y data2z
2 x3 y3 z3 data3x data3y data3z
You can used batched
to split each lines into batches of three, and then use zip
to transpose them:
from itertools import batched
all_data = []
with open(old_file, "r") as inputfile:
skip_first_row = inputfile.readline()
for line in inputfile:
line_list = line.split()
all_data += zip(*batched(line_list, 3))