python - Skip reading line if next line has a particular string

I have a large file that has a lot of timing information. An excerpt looks like

CPU time for df vj and vk    329.45135 sec, wall time     10.42650 sec
CPU time for df vj and vk    331.06361 sec, wall time     10.48211 sec
CPU time for df vj and vk    330.34512 sec, wall time     10.45198 sec
CPU time for df vj and vk    330.43818 sec, wall time     10.46212 sec
CPU time for orbital rotation   1341.99499 sec, wall time     42.54674 sec
CPU time for update CAS DM     12.02945 sec, wall time      0.37361 sec
CPU time for micro iter  1      0.00003 sec, wall time      0.00003 sec
CPU time for density fitting ao2mo pass1    157.41450 sec, wall time     19.02017 sec
CPU time for density fitting papa pass2     11.19426 sec, wall time      0.61816 sec
CPU time for density fitting ppaa pass2     24.55801 sec, wall time      6.68668 sec
CPU time for df vj and vk    171.32896 sec, wall time      5.41600 sec
CPU time for density fitting ao2mo    366.81797 sec, wall time     33.65705 sec
CPU time for update eri    366.82145 sec, wall time     33.66198 sec
CPU time for integral transformation to CAS space      0.00001 sec, wall time      0.00000 sec

I have to calculate sum of all df vj and vk and density fitting ao2mo , among several other parameters. My core functionality is

total+=sum([float(line.split()[position]) for line in open(file_name).readlines() if parameter in line])

where position depends on whether I am trying to get CPU time or wall time, file_name is file in which text is stored, and parameter is the function I am trying to collect data for.

I get 47.23871 for df vj and vk and 33.65705 for density fitting ao2mo .

The question is as follows - density fitting ao2mo contains the time of df vkj and vk above it (the 5.41600 sec line). I would like df vj and vk to exclude the lines where it is immediately followed by line containing density fitting ao2mo .

Therefore, I would like the result to be df vj and vk as 41.82271. How can I do this?

I have a large file that has a lot of timing information. An excerpt looks like

CPU time for df vj and vk    329.45135 sec, wall time     10.42650 sec
CPU time for df vj and vk    331.06361 sec, wall time     10.48211 sec
CPU time for df vj and vk    330.34512 sec, wall time     10.45198 sec
CPU time for df vj and vk    330.43818 sec, wall time     10.46212 sec
CPU time for orbital rotation   1341.99499 sec, wall time     42.54674 sec
CPU time for update CAS DM     12.02945 sec, wall time      0.37361 sec
CPU time for micro iter  1      0.00003 sec, wall time      0.00003 sec
CPU time for density fitting ao2mo pass1    157.41450 sec, wall time     19.02017 sec
CPU time for density fitting papa pass2     11.19426 sec, wall time      0.61816 sec
CPU time for density fitting ppaa pass2     24.55801 sec, wall time      6.68668 sec
CPU time for df vj and vk    171.32896 sec, wall time      5.41600 sec
CPU time for density fitting ao2mo    366.81797 sec, wall time     33.65705 sec
CPU time for update eri    366.82145 sec, wall time     33.66198 sec
CPU time for integral transformation to CAS space      0.00001 sec, wall time      0.00000 sec

I have to calculate sum of all df vj and vk and density fitting ao2mo , among several other parameters. My core functionality is

total+=sum([float(line.split()[position]) for line in open(file_name).readlines() if parameter in line])

where position depends on whether I am trying to get CPU time or wall time, file_name is file in which text is stored, and parameter is the function I am trying to collect data for.

I get 47.23871 for df vj and vk and 33.65705 for density fitting ao2mo .

Therefore, I would like the result to be df vj and vk as 41.82271. How can I do this?

Share Improve this question edited Feb 4 at 22:48 globglogabgalab 4493 silver badges14 bronze badges asked Feb 2 at 3:09 Valay Agarawal 3910 bronze badges

1 Put all the lines in a list named lines. Then loop over the list indexes, so you can access lines[i+1] to check the next line. – Barmar Commented Feb 2 at 3:39

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

Are you sure you want to use a one liner for that? Using a regular for loop will be easier to write, read and debug. Obscure one-liners are rarely the way to go in Python.

total = 0
prev_line = ""

with open(file_name, "r") as fr:
    for line in fr.readlines():
        if (parameter in prev_line) and (excluded_parameter not in line):
            total += float(prev_line.split()[position])
        prev_line = line

# handling last line
if parameter in line:
    total += float(line.split()[position])

If you really want to use list comprehension, you can use either a complex combination of walrus operators, or simply use itertools.pairwise from the standard library itertools:

from itertools import pairwise

total = sum(float(prev_line.split()[position]) for prev_line, line in pairwise(open(file_name, "r").readlines()) if (parameter in prev_line) and (excluded_parameter not in line))

Doing so, you lose the last line and cannot get its value as your line and prev_line variables are not defined outside of your list comprehension, and your file-reading lines generator isn't neither. There might be a (dirty) way to handle this of course.

I solved this by checking if the next line has the parameter to be excluded.

The list comprehension method looks like

lines = open(file_name).readlines()
total+=sum([float(line.split()[position]) for i,line in enumerate(lines) if (parameter in line) and ((excluded_parameter not in lines[i+1]) and (i+1<len(lines)))])

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Skip reading line if next line has a particular string - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)