Good afternoon everyone, I have a problem with a Python script. I need to remove the namespaces from the following input:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<enviOperacaoRessarcimento xmlns=";>
<versao>2.00</versao>
<dadosDeclaracao>
<cnpjRaiz>0000000</cnpjRaiz>
The problem is that the expected output needs to keep the:
xmlns=";
and the result is:
<?xml version='1.0' encoding='utf-8'?>
<enviOperacaoRessarcimento>
<versao>2.00</versao>
<dadosDeclaracao>
<cnpjRaiz>0000000</cnpjRaiz>
In other words, when I remove the namespaces, I also remove the xmlns=";
the script is as follows:
def remove_namespace(xml_str):
#xml_str = re.sub(r' xmlns="[^"]+"', '', xml_str, count=1)
xml_str = re.sub(r'ns0:|ns0','', xml_str, count=1)
return xml_str
for index, row in df.iterrows():
try:
# Remove namespaces from the XML dat
clean_xml = remove_namespace(row['xml'])
# Parse the cleaned XML data
root = ET.fromstring(clean_xml)
print(root)
# Write the XML data to the specified output file
tree = ET.ElementTree(root)
tree.write(row['fullpath'], xml_declaration=True, encoding='utf-8', method="xml")
print(root)
# Update status column and print message to Alteryx result window
df.at[index, 'status'] = 'Successful'
except Exception as e:
# Update status column with error message
df.at[index, 'status'] = f'error: {str(e)}'
Could someone help me?