from sys import argv
script, input_file = argv
hello = open(input_file)
read1 = hello.readline()
print("Line1: ", read1, end = "")
line1 = len(read1) + 1
print(line1)
read2 = hello.readline()
print("Line2: ", read2, end = "")
line2 = len(read1 + read2) + 2
print(line2)
read3 = hello.readline()
print("Line3: ", read3, end = "")
line3 = len(read1 + read2 + read3) + 3
print(line3)
read4 = hello.readline()
print("Line4: ", read4)
beep = input("""I give you the option to choose the line you print:
1. Type: first line
2. Type: second line
3. Type: third line
4. Type: fourth line
""")
if (beep == "first line"):
choice = 0
elif (beep == "second line"):
choice = line1
elif (beep == "third line"):
choice = line2
else:
choice = line3
def runner(f):
hello.seek(f)
green = hello.readline()
print("The line reads: ", green)
runner(choice)
I want to print a particular line from a text file as specified by the user. I know that the seek(
) function takes input in terms of bytes to move the reading pointer at the position corresponding to the size in bytes. Like if a text file has 4 lines and each line is 10 bytes in size (including the newline character) then the size input 11 corresponds to the beginning of the second line. Now if I want to move the reading position to the beginning of the fourth line I will input 31 (sum of sizes of previous lines + 1) to the seek()
function to move the reading/writing pointer to the fourth line.
But in the attached piece of code I have input sum of the sizes of previous 3 lines + 3 to get the position of the beginning of the fourth line (instead of sum of sizes of previous 3 lines + 1, which makes more sense), but still the code runs perfectly fine.
In fact when I replace the +3 (or +2) by +1, I do not get the required line printed.
enter image description here
from sys import argv
script, input_file = argv
hello = open(input_file)
read1 = hello.readline()
print("Line1: ", read1, end = "")
line1 = len(read1) + 1
print(line1)
read2 = hello.readline()
print("Line2: ", read2, end = "")
line2 = len(read1 + read2) + 2
print(line2)
read3 = hello.readline()
print("Line3: ", read3, end = "")
line3 = len(read1 + read2 + read3) + 3
print(line3)
read4 = hello.readline()
print("Line4: ", read4)
beep = input("""I give you the option to choose the line you print:
1. Type: first line
2. Type: second line
3. Type: third line
4. Type: fourth line
""")
if (beep == "first line"):
choice = 0
elif (beep == "second line"):
choice = line1
elif (beep == "third line"):
choice = line2
else:
choice = line3
def runner(f):
hello.seek(f)
green = hello.readline()
print("The line reads: ", green)
runner(choice)
I want to print a particular line from a text file as specified by the user. I know that the seek(
) function takes input in terms of bytes to move the reading pointer at the position corresponding to the size in bytes. Like if a text file has 4 lines and each line is 10 bytes in size (including the newline character) then the size input 11 corresponds to the beginning of the second line. Now if I want to move the reading position to the beginning of the fourth line I will input 31 (sum of sizes of previous lines + 1) to the seek()
function to move the reading/writing pointer to the fourth line.
But in the attached piece of code I have input sum of the sizes of previous 3 lines + 3 to get the position of the beginning of the fourth line (instead of sum of sizes of previous 3 lines + 1, which makes more sense), but still the code runs perfectly fine.
In fact when I replace the +3 (or +2) by +1, I do not get the required line printed.
enter image description here
Share Improve this question edited Mar 11 at 9:06 khelwood 59.3k14 gold badges89 silver badges115 bronze badges asked Mar 10 at 22:20 Govind SharmaGovind Sharma 11 silver badge1 bronze badge 1- This question is similar to: How to jump to a particular line in a huge text file?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – JonSG Commented Mar 11 at 14:09
3 Answers
Reset to default 1As mentioned in a comment, you can't assume that newline is just one byte in the file. On Windows it's two bytes, CR followed by LF.
Use hello.tell()
to get the file position before reading each line rather than adding up line lengths and adding 1 for each line.
hello = open(input_file)
read1 = hello.readline()
print("Line1: ", read1, end = "")
line1 = hello.tell()
print(line1)
read2 = hello.readline()
print("Line2: ", read2, end = "")
line2 = hello.tell()
print(line2)
read3 = hello.readline()
print("Line3: ", read3, end = "")
line3 = hello.tell()
print(line3)
read4 = hello.readline()
print("Line4: ", read4)
In order to get to a specific line in a text file you can only rely on seek() if all lines in the file are of the same length. Otherwise, you would need to either read all lines and choose an index in the list acquired in that way (e.g., readlines()) or read one line at a time until you reach the one you're interested in.
However, if you know that all lines are of the same length then the only issue in terms of using offsets is whether each line ends with "\n" or "\r\n" (Windows).
You could do this:
from pathlib import Path
FILENAME = Path("foo.txt")
# detect Windows formatted text lines
def is_windows(path: Path) -> bool:
prev = ""
with path.open() as f:
while (c := f.read(1)):
if c == "\n":
return prev == "\r"
prev = c
return False
# get the length of the first line
# bear in mind that if the underlying file uses "\r\n" line terminators
# ...readline will convert "\r\n" to "\n"
def line_length(path: Path) -> int:
with path.open() as f:
return len(f.readline())
# lineno is base zero
def get_line(path: Path, lineno: int) -> str:
length = line_length(path)
if is_windows(path):
length += 1
with path.open() as f:
f.seek(lineno * length)
return f.readline()
return ""
i = 0
while (line := get_line(FILENAME, i)):
print(line, end="")
i += 1
Barmar's response is probably the best one for the question, but winding back up given the dataset is minuscule you'd be better off just reading the lines in a list in memory, then printing again from that:
from sys import argv
script, input_file = argv
with open(input_file) as f:
lines = f.readlines()
for i, l in enumerate(lines):
print(f"Line{i}: {l}", end="")
beep = input("""I give you the option to choose the line you print:
1. Type: first line
2. Type: second line
3. Type: third line
4. Type: fourth line
""")
match beep:
case "first line":
idx = 0
case "second line":
idx = 1
case "third line":
idx = 2
case _:
idx = 3
print("The line reads:", lines[idx])