最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Parse the path directory and file location upto last x paths in pandas - Stack Overflow

programmeradmin2浏览0评论

I have a file path location:

     file
df = /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c
     /a/b/c/d/e/x/b.c

I am using the below logic to remove redundant path

df["file"]=  df["file"].str.extract(r"(?:/[\w\.-]+){7}/(.+)")

My output is

df = /a/b/c/d/e/f/g/...
     /a/b/c/d/e/x/b.c

I am interested to display the file name a.c completely and have output to show the accurate location of a.c with atleast last 3 subdirectory and first few such as 5 subdirectories.

/a/b/c/d/..../l/m/n/a.c

How can we parse correctly ?

I have a file path location:

     file
df = /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c
     /a/b/c/d/e/x/b.c

I am using the below logic to remove redundant path

df["file"]=  df["file"].str.extract(r"(?:/[\w\.-]+){7}/(.+)")

My output is

df = /a/b/c/d/e/f/g/...
     /a/b/c/d/e/x/b.c

I am interested to display the file name a.c completely and have output to show the accurate location of a.c with atleast last 3 subdirectory and first few such as 5 subdirectories.

/a/b/c/d/..../l/m/n/a.c

How can we parse correctly ?

Share Improve this question asked Jan 20 at 10:30 user1846251user1846251 451 silver badge7 bronze badges 1
  • THis will only print the last file name. I am more interested in printing first 3 or 4 paths and last 3-4 paths when file path is larger – user1846251 Commented Jan 20 at 10:39
Add a comment  | 

2 Answers 2

Reset to default 1

Using pathlib, you can parse the path as a filesystem path:

>>> import pathlib
>>> path = pathlib.Path("/a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c")

Now you can access it as an object:

>>> path.parts[0:4]
('/', 'a', 'b', 'c')
>>> path.parts[-4:]
('l', 'm', 'n', 'a.c')

And with that, you can format the path as you wish (not the most elegant code, but you probably catch my drift):

f"{os.path.join(*path.parts[0:4])}...{os.path.join(*path.parts[-5:])}"
>>> '/a/b/c...k/l/m/n/a.c'

You might want to use str.replace:

df['new_file'] = df['file'].str.replace(r'^(/(?:[^/]+/){5}).*((?:/[^/]+){4})$',
                                        r'\1...\2', regex=True)

Output:

                               file                  new_file
0  /a/b/c/d/e/f/g/h/i/j/k/l/m/n/a.c  /a/b/c/d/e/.../l/m/n/a.c
1                  /a/b/c/d/e/x/b.c          /a/b/c/d/e/x/b.c

regex demo

发布评论

评论列表(0)

  1. 暂无评论