最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - String cleaning removing consecutive value and put comma in the end - Stack Overflow

programmeradmin2浏览0评论

I have this string from an email I'm scraping:

TICKET\xa0\xa0 STATE\xa0\xa0\xa0\xa0 ACCOUNT IDENTIFIER\xa0\xa0\xa0 FILE DIRECTORY\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 CODE

My objective are the following:

  1. Remove \xa0
  2. Create comma separation for each group string

This is my ideal result:

TICKET,STATE,ACCOUNT IDENTIFIER,FILE DIRECTORY

On the other hand, here's what I ended up getting:

#code
my_string.replace(' ', ',').replace('\xa0', '')

#result
TICKET,STATE,ACCOUNT,IDENTIFIER,FILE,DIRECTORY

I was thinking of using regex however, I have no idea how I can implement the logic.

I have this string from an email I'm scraping:

TICKET\xa0\xa0 STATE\xa0\xa0\xa0\xa0 ACCOUNT IDENTIFIER\xa0\xa0\xa0 FILE DIRECTORY\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 CODE

My objective are the following:

  1. Remove \xa0
  2. Create comma separation for each group string

This is my ideal result:

TICKET,STATE,ACCOUNT IDENTIFIER,FILE DIRECTORY

On the other hand, here's what I ended up getting:

#code
my_string.replace(' ', ',').replace('\xa0', '')

#result
TICKET,STATE,ACCOUNT,IDENTIFIER,FILE,DIRECTORY

I was thinking of using regex however, I have no idea how I can implement the logic.

Share Improve this question edited Nov 20, 2024 at 20:17 Wiktor Stribiżew 628k41 gold badges498 silver badges611 bronze badges asked Nov 20, 2024 at 20:07 MakuMaku 1,61012 silver badges21 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

The relevant string separating the items you care about is \xa0, so you can split on that first and then just keep the elements which contain something other than just whitespace:

my_string = "TICKET\xa0\xa0 STATE\xa0\xa0\xa0\xa0 ACCOUNT IDENTIFIER\xa0\xa0\xa0 FILE DIRECTORY\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 CODE"

print(", ".join(x.strip() for x in my_string.split("\xa0") if x.strip()))
# Output: TICKET, STATE, ACCOUNT IDENTIFIER, FILE DIRECTORY, CODE
发布评论

评论列表(0)

  1. 暂无评论