I have a file path location:
file
df = /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c
/a/b/c/d/e/x/b.c
I am using the below logic to remove redundant path
df["file"]= df["file"].str.extract(r"(?:/[\w\.-]+){7}/(.+)")
My output is
df = /a/b/c/d/e/f/g/...
/a/b/c/d/e/x/b.c
I am interested to display the file name a.c completely and have output to show the accurate location of a.c with atleast last 3 subdirectory and first few such as 5 subdirectories.
/a/b/c/d/..../l/m/n/a.c
How can we parse correctly ?
I have a file path location:
file
df = /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c
/a/b/c/d/e/x/b.c
I am using the below logic to remove redundant path
df["file"]= df["file"].str.extract(r"(?:/[\w\.-]+){7}/(.+)")
My output is
df = /a/b/c/d/e/f/g/...
/a/b/c/d/e/x/b.c
I am interested to display the file name a.c completely and have output to show the accurate location of a.c with atleast last 3 subdirectory and first few such as 5 subdirectories.
/a/b/c/d/..../l/m/n/a.c
How can we parse correctly ?
Share Improve this question asked Jan 20 at 10:30 user1846251user1846251 451 silver badge7 bronze badges 1- THis will only print the last file name. I am more interested in printing first 3 or 4 paths and last 3-4 paths when file path is larger – user1846251 Commented Jan 20 at 10:39
2 Answers
Reset to default 1Using pathlib, you can parse the path as a filesystem path:
>>> import pathlib
>>> path = pathlib.Path("/a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c")
Now you can access it as an object:
>>> path.parts[0:4]
('/', 'a', 'b', 'c')
>>> path.parts[-4:]
('l', 'm', 'n', 'a.c')
And with that, you can format the path as you wish (not the most elegant code, but you probably catch my drift):
f"{os.path.join(*path.parts[0:4])}...{os.path.join(*path.parts[-5:])}"
>>> '/a/b/c...k/l/m/n/a.c'
You might want to use str.replace
:
df['new_file'] = df['file'].str.replace(r'^(/(?:[^/]+/){5}).*((?:/[^/]+){4})$',
r'\1...\2', regex=True)
Output:
file new_file
0 /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c /a/b/c/d/e/.../l/m/n/a.c
1 /a/b/c/d/e/x/b.c /a/b/c/d/e/x/b.c
regex demo