I have a file where i can not read the attributes since the tag, looks different than usual, i have tried to use lxml
as follow:
from lxml import etree
xDoc = etree.parse(xmlFile)
folioFiscal = xDoc.find('tfd:TimbreFiscalDigital')
print(folioFiscal)
the xml file is as follow:
<?xml version="1.0" encoding="utf-8"?>
<cfdi:Comprobante xsi:schemaLocation=" .xsd" Version="4.0" Serie="AA" Folio="147" Fecha="2025-01-01T17:51:12" Sello="[alfanumeric_values]" FormaPago="03" NoCertificado="[numeric_values]" Certificado="[alfanumeric_values]" SubTotal="###.00" Moneda="MXN" Total="###.00" TipoDeComprobante="I" Exportacion="01" MetodoPago="PUE" LugarExpedicion="#####" xmlns:cfdi="; xmlns:xsi=";>
<cfdi:Emisor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" RegimenFiscal="###" />
<cfdi:Receptor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" DomicilioFiscalReceptor="#####" RegimenFiscalReceptor="###" UsoCFDI="D01" />
<cfdi:Conceptos>
<cfdi:Concepto ClaveProdServ="########" Cantidad="1.00" ClaveUnidad="ACT" Unidad="Actividad" Descripcion="Honorarios Medicos" ValorUnitario="1100.00" Importe="1100.00" ObjetoImp="02">
<cfdi:Impuestos>
<cfdi:Traslados>
<cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
</cfdi:Traslados>
</cfdi:Impuestos>
</cfdi:Concepto>
</cfdi:Conceptos>
<cfdi:Impuestos>
<cfdi:Traslados>
<cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
</cfdi:Traslados>
</cfdi:Impuestos>
<cfdi:Complemento>
<tfd:TimbreFiscalDigital xmlns:tfd="; xsi:schemaLocation=" .xsd" Version="1.1" UUID="[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}" FechaTimbrado="2025-01-01T17:51:34" RfcProvCertif="SAT[A-Z0-9]{8}" SelloCFD="[alfanumeric_values]" NoCertificadoSAT="[0-9]{20}" SelloSAT="[alfanumeric_values]" />
</cfdi:Complemento>
</cfdi:Comprobante>
I want to obtain the values in UUID attribute that looks like [A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}
.
I have a file where i can not read the attributes since the tag, looks different than usual, i have tried to use lxml
as follow:
from lxml import etree
xDoc = etree.parse(xmlFile)
folioFiscal = xDoc.find('tfd:TimbreFiscalDigital')
print(folioFiscal)
the xml file is as follow:
<?xml version="1.0" encoding="utf-8"?>
<cfdi:Comprobante xsi:schemaLocation="http://www.sat.gob.mx/cfd/4 http://www.sat.gob.mx/sitio_internet/cfd/4/cfdv40.xsd" Version="4.0" Serie="AA" Folio="147" Fecha="2025-01-01T17:51:12" Sello="[alfanumeric_values]" FormaPago="03" NoCertificado="[numeric_values]" Certificado="[alfanumeric_values]" SubTotal="###.00" Moneda="MXN" Total="###.00" TipoDeComprobante="I" Exportacion="01" MetodoPago="PUE" LugarExpedicion="#####" xmlns:cfdi="http://www.sat.gob.mx/cfd/4" xmlns:xsi="http://www.w3./2001/XMLSchema-instance">
<cfdi:Emisor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" RegimenFiscal="###" />
<cfdi:Receptor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" DomicilioFiscalReceptor="#####" RegimenFiscalReceptor="###" UsoCFDI="D01" />
<cfdi:Conceptos>
<cfdi:Concepto ClaveProdServ="########" Cantidad="1.00" ClaveUnidad="ACT" Unidad="Actividad" Descripcion="Honorarios Medicos" ValorUnitario="1100.00" Importe="1100.00" ObjetoImp="02">
<cfdi:Impuestos>
<cfdi:Traslados>
<cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
</cfdi:Traslados>
</cfdi:Impuestos>
</cfdi:Concepto>
</cfdi:Conceptos>
<cfdi:Impuestos>
<cfdi:Traslados>
<cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
</cfdi:Traslados>
</cfdi:Impuestos>
<cfdi:Complemento>
<tfd:TimbreFiscalDigital xmlns:tfd="http://www.sat.gob.mx/TimbreFiscalDigital" xsi:schemaLocation="http://www.sat.gob.mx/TimbreFiscalDigital http://www.sat.gob.mx/sitio_internet/cfd/TimbreFiscalDigital/TimbreFiscalDigitalv11.xsd" Version="1.1" UUID="[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}" FechaTimbrado="2025-01-01T17:51:34" RfcProvCertif="SAT[A-Z0-9]{8}" SelloCFD="[alfanumeric_values]" NoCertificadoSAT="[0-9]{20}" SelloSAT="[alfanumeric_values]" />
</cfdi:Complemento>
</cfdi:Comprobante>
I want to obtain the values in UUID attribute that looks like [A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}
.
- While asking a question, you need to provide a minimal reproducible example: Please edit your original question and provide the following: (1) Well-formed XML file sample with all relevant namespaces. (2) What you need to do, i.e. logic, and your code attempt trying to implement it. (3) Desired output based on the sample data in #1 above. – Yitzhak Khabinsky Commented yesterday
- I have added the full xml, the output needed is mentioned at the end of the question – Rafa Barragan Commented yesterday
3 Answers
Reset to default 3There are two point to consider to parse this kind of xml:
- Complete namespace definition
- Use
find()
if you are sure the element is a direct child. Usexpath()
if the element is nested or namespaces are dynamic.
from lxml import etree
xmlFile = "ex.xml"
xDoc = etree.parse(xmlFile).getroot()
ns = xDoc.nsmap.copy()
# Find and add the tfd namespace
for elem in xDoc.iter():
if elem.tag.startswith("{"):
uri = elem.tag.split("}")[0].strip("{")
if "TimbreFiscalDigital" in elem.tag and uri not in ns.values():
ns["tfd"] = uri
# print("Extracted Namespaces:", ns)
# Use XPath instead of find()
folioFiscal = xDoc.xpath("//tfd:TimbreFiscalDigital", namespaces=ns)
if folioFiscal:
uuid = folioFiscal[0].get("UUID")
print("UUID:", uuid)
else:
print("TimbreFiscalDigital not found!")
This one is a bit easier than what @Hermann12 already kindly pointed out.
When working with XML files that use namespaces, you need to tell lxml about those namespaces so it can correctly find the elements.
In your case, the <tfd:TimbreFiscalDigital>
element is in the namespace http://www.sat.gob.mx/TimbreFiscalDigital
. You can modify your code to include a namespace mapping and then search using an XPath expression. Try this:
from lxml import etree
# Parse the XML file
doc = etree.parse('xmlFile.xml')
# Define namespace mapping for the tfd prefix
namespaces = {'tfd': 'http://www.sat.gob.mx/TimbreFiscalDigital'}
# Use XPath to find the TimbreFiscalDigital element
timbre = doc.find('.//tfd:TimbreFiscalDigital', namespaces=namespaces)
if timbre is not None:
# Get the UUID attribute
uuid = timbre.get('UUID')
print("UUID:", uuid)
This is probably not the best way to do this but it does work for the data as given in the question.
Just iterate over all elements of the document and examine the tag name but, effectively, ignore the namespace.
Like this:
from lxml import etree
import re
PATTERN = repile(r"^{.*}(.*)$")
FILENAME = "foo.xml"
doc = etree.parse(FILENAME)
for e in doc.iter():
m = PATTERN.match(e.tag)
if m and m.group(1) == "TimbreFiscalDigital":
print(e.get("UUID"))
break
Output:
[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}