I have a Markdown file README.md
that contains HTML elements, such as an <img>
tag with attributes id
and src
. I want to update the attributes of this HTML element programmatically using Python.
For example, I want to update the src attribute of a tag with id="updatable". I’ve successfully extracted the src attribute using BeautifulSoup, but I’m stuck when trying to update it.
Here’s my current approach:
import markdown
from bs4 import BeautifulSoup
import json
import chardet
def get_encoding_type(file_path):
with open(file_path, 'rb') as f:
sample = f.read(1024)
cur_encoding = chardet.detect(sample)['encoding']
return cur_encoding
with open("README.md", "r", encoding = get_encoding_type("README.md"),errors='ignore') as f:
markdown_content = f.read()
html_content = markdown.markdown(markdown_content)
parser = BeautifulSoup(html_content, "html.parser")
img_id = parser.find("img",{'id':'updatable'})
try:
img_source = img_id['src']
print(img_source)
except:
print("No image found")
So far, I’ve managed to extract the src attribute, but I’m unsure how to update it.
I have a Markdown file README.md
that contains HTML elements, such as an <img>
tag with attributes id
and src
. I want to update the attributes of this HTML element programmatically using Python.
For example, I want to update the src attribute of a tag with id="updatable". I’ve successfully extracted the src attribute using BeautifulSoup, but I’m stuck when trying to update it.
Here’s my current approach:
import markdown
from bs4 import BeautifulSoup
import json
import chardet
def get_encoding_type(file_path):
with open(file_path, 'rb') as f:
sample = f.read(1024)
cur_encoding = chardet.detect(sample)['encoding']
return cur_encoding
with open("README.md", "r", encoding = get_encoding_type("README.md"),errors='ignore') as f:
markdown_content = f.read()
html_content = markdown.markdown(markdown_content)
parser = BeautifulSoup(html_content, "html.parser")
img_id = parser.find("img",{'id':'updatable'})
try:
img_source = img_id['src']
print(img_source)
except:
print("No image found")
So far, I’ve managed to extract the src attribute, but I’m unsure how to update it.
Share Improve this question edited Jan 19 at 16:50 CPlus 4,93045 gold badges30 silver badges73 bronze badges asked Jan 18 at 15:41 rakin235rakin235 10111 bronze badges1 Answer
Reset to default 0You could simply assign the new value to the src attribute:
soup.find("img",{'id':'updatable'})['src'] = 'new_value'
Example:
from bs4 import BeautifulSoup
html = '<html><body><img id="updatable" src="old_value"/></body></html>'
soup = BeautifulSoup(html)
img = soup.find("img",{'id':'updatable'})
img['src'] = img['src'].replace('old','new')
print(soup)
Output:
<html><body><img id="updatable" src="new_value"/></body></html>