I have a large file that has a lot of timing information. An excerpt looks like
CPU time for df vj and vk 329.45135 sec, wall time 10.42650 sec
CPU time for df vj and vk 331.06361 sec, wall time 10.48211 sec
CPU time for df vj and vk 330.34512 sec, wall time 10.45198 sec
CPU time for df vj and vk 330.43818 sec, wall time 10.46212 sec
CPU time for orbital rotation 1341.99499 sec, wall time 42.54674 sec
CPU time for update CAS DM 12.02945 sec, wall time 0.37361 sec
CPU time for micro iter 1 0.00003 sec, wall time 0.00003 sec
CPU time for density fitting ao2mo pass1 157.41450 sec, wall time 19.02017 sec
CPU time for density fitting papa pass2 11.19426 sec, wall time 0.61816 sec
CPU time for density fitting ppaa pass2 24.55801 sec, wall time 6.68668 sec
CPU time for df vj and vk 171.32896 sec, wall time 5.41600 sec
CPU time for density fitting ao2mo 366.81797 sec, wall time 33.65705 sec
CPU time for update eri 366.82145 sec, wall time 33.66198 sec
CPU time for integral transformation to CAS space 0.00001 sec, wall time 0.00000 sec
I have to calculate sum of all df vj and vk
and density fitting ao2mo
, among several other parameters. My core functionality is
total+=sum([float(line.split()[position]) for line in open(file_name).readlines() if parameter in line])
where position
depends on whether I am trying to get CPU time or wall time, file_name
is file in which text is stored, and parameter
is the function I am trying to collect data for.
I get 47.23871 for df vj and vk
and 33.65705 for density fitting ao2mo
.
The question is as follows - density fitting ao2mo
contains the time of df vkj and vk
above it (the 5.41600 sec line). I would like df vj and vk
to exclude the lines where it is immediately followed by line containing density fitting ao2mo
.
Therefore, I would like the result to be df vj and vk
as 41.82271. How can I do this?
I have a large file that has a lot of timing information. An excerpt looks like
CPU time for df vj and vk 329.45135 sec, wall time 10.42650 sec
CPU time for df vj and vk 331.06361 sec, wall time 10.48211 sec
CPU time for df vj and vk 330.34512 sec, wall time 10.45198 sec
CPU time for df vj and vk 330.43818 sec, wall time 10.46212 sec
CPU time for orbital rotation 1341.99499 sec, wall time 42.54674 sec
CPU time for update CAS DM 12.02945 sec, wall time 0.37361 sec
CPU time for micro iter 1 0.00003 sec, wall time 0.00003 sec
CPU time for density fitting ao2mo pass1 157.41450 sec, wall time 19.02017 sec
CPU time for density fitting papa pass2 11.19426 sec, wall time 0.61816 sec
CPU time for density fitting ppaa pass2 24.55801 sec, wall time 6.68668 sec
CPU time for df vj and vk 171.32896 sec, wall time 5.41600 sec
CPU time for density fitting ao2mo 366.81797 sec, wall time 33.65705 sec
CPU time for update eri 366.82145 sec, wall time 33.66198 sec
CPU time for integral transformation to CAS space 0.00001 sec, wall time 0.00000 sec
I have to calculate sum of all df vj and vk
and density fitting ao2mo
, among several other parameters. My core functionality is
total+=sum([float(line.split()[position]) for line in open(file_name).readlines() if parameter in line])
where position
depends on whether I am trying to get CPU time or wall time, file_name
is file in which text is stored, and parameter
is the function I am trying to collect data for.
I get 47.23871 for df vj and vk
and 33.65705 for density fitting ao2mo
.
The question is as follows - density fitting ao2mo
contains the time of df vkj and vk
above it (the 5.41600 sec line). I would like df vj and vk
to exclude the lines where it is immediately followed by line containing density fitting ao2mo
.
Therefore, I would like the result to be df vj and vk
as 41.82271. How can I do this?
2 Answers
Reset to default 2Are you sure you want to use a one liner for that? Using a regular for
loop will be easier to write, read and debug. Obscure one-liners are rarely the way to go in Python.
total = 0
prev_line = ""
with open(file_name, "r") as fr:
for line in fr.readlines():
if (parameter in prev_line) and (excluded_parameter not in line):
total += float(prev_line.split()[position])
prev_line = line
# handling last line
if parameter in line:
total += float(line.split()[position])
If you really want to use list comprehension, you can use either a complex combination of walrus operators, or simply use itertools.pairwise
from the standard library itertools:
from itertools import pairwise
total = sum(float(prev_line.split()[position]) for prev_line, line in pairwise(open(file_name, "r").readlines()) if (parameter in prev_line) and (excluded_parameter not in line))
Doing so, you lose the last line and cannot get its value as your line
and prev_line
variables are not defined outside of your list comprehension, and your file-reading lines generator isn't neither. There might be a (dirty) way to handle this of course.
I solved this by checking if the next line has the parameter to be excluded.
The list comprehension method looks like
lines = open(file_name).readlines()
total+=sum([float(line.split()[position]) for i,line in enumerate(lines) if (parameter in line) and ((excluded_parameter not in lines[i+1]) and (i+1<len(lines)))])
lines
. Then loop over the list indexes, so you can accesslines[i+1]
to check the next line. – Barmar Commented Feb 2 at 3:39