最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to get attributes values from an xml file (CFDI - SAT Mexico) - Stack Overflow

programmeradmin1浏览0评论

I have a file where i can not read the attributes since the tag, looks different than usual, i have tried to use lxml as follow:

from lxml import etree

xDoc = etree.parse(xmlFile)
folioFiscal = xDoc.find('tfd:TimbreFiscalDigital')
print(folioFiscal)

the xml file is as follow:

<?xml version="1.0" encoding="utf-8"?>
<cfdi:Comprobante xsi:schemaLocation=" .xsd" Version="4.0" Serie="AA" Folio="147" Fecha="2025-01-01T17:51:12" Sello="[alfanumeric_values]" FormaPago="03" NoCertificado="[numeric_values]" Certificado="[alfanumeric_values]" SubTotal="###.00" Moneda="MXN" Total="###.00" TipoDeComprobante="I" Exportacion="01" MetodoPago="PUE" LugarExpedicion="#####" xmlns:cfdi="; xmlns:xsi=";>
  <cfdi:Emisor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" RegimenFiscal="###" />
  <cfdi:Receptor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" DomicilioFiscalReceptor="#####" RegimenFiscalReceptor="###" UsoCFDI="D01" />
  <cfdi:Conceptos>
    <cfdi:Concepto ClaveProdServ="########" Cantidad="1.00" ClaveUnidad="ACT" Unidad="Actividad" Descripcion="Honorarios Medicos" ValorUnitario="1100.00" Importe="1100.00" ObjetoImp="02">
      <cfdi:Impuestos>
        <cfdi:Traslados>
          <cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
        </cfdi:Traslados>
      </cfdi:Impuestos>
    </cfdi:Concepto>
  </cfdi:Conceptos>
  <cfdi:Impuestos>
    <cfdi:Traslados>
      <cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
    </cfdi:Traslados>
  </cfdi:Impuestos>
  <cfdi:Complemento>
    <tfd:TimbreFiscalDigital xmlns:tfd="; xsi:schemaLocation=" .xsd" Version="1.1" UUID="[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}" FechaTimbrado="2025-01-01T17:51:34" RfcProvCertif="SAT[A-Z0-9]{8}" SelloCFD="[alfanumeric_values]" NoCertificadoSAT="[0-9]{20}" SelloSAT="[alfanumeric_values]" />
  </cfdi:Complemento>
</cfdi:Comprobante>

I want to obtain the values in UUID attribute that looks like [A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}.

I have a file where i can not read the attributes since the tag, looks different than usual, i have tried to use lxml as follow:

from lxml import etree

xDoc = etree.parse(xmlFile)
folioFiscal = xDoc.find('tfd:TimbreFiscalDigital')
print(folioFiscal)

the xml file is as follow:

<?xml version="1.0" encoding="utf-8"?>
<cfdi:Comprobante xsi:schemaLocation="http://www.sat.gob.mx/cfd/4 http://www.sat.gob.mx/sitio_internet/cfd/4/cfdv40.xsd" Version="4.0" Serie="AA" Folio="147" Fecha="2025-01-01T17:51:12" Sello="[alfanumeric_values]" FormaPago="03" NoCertificado="[numeric_values]" Certificado="[alfanumeric_values]" SubTotal="###.00" Moneda="MXN" Total="###.00" TipoDeComprobante="I" Exportacion="01" MetodoPago="PUE" LugarExpedicion="#####" xmlns:cfdi="http://www.sat.gob.mx/cfd/4" xmlns:xsi="http://www.w3./2001/XMLSchema-instance">
  <cfdi:Emisor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" RegimenFiscal="###" />
  <cfdi:Receptor Rfc="[A-Z]{4}[0-9]{6}[A-Z]{3}" Nombre="[alfanumeric_values]" DomicilioFiscalReceptor="#####" RegimenFiscalReceptor="###" UsoCFDI="D01" />
  <cfdi:Conceptos>
    <cfdi:Concepto ClaveProdServ="########" Cantidad="1.00" ClaveUnidad="ACT" Unidad="Actividad" Descripcion="Honorarios Medicos" ValorUnitario="1100.00" Importe="1100.00" ObjetoImp="02">
      <cfdi:Impuestos>
        <cfdi:Traslados>
          <cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
        </cfdi:Traslados>
      </cfdi:Impuestos>
    </cfdi:Concepto>
  </cfdi:Conceptos>
  <cfdi:Impuestos>
    <cfdi:Traslados>
      <cfdi:Traslado Base="1100.00" Impuesto="002" TipoFactor="Exento" />
    </cfdi:Traslados>
  </cfdi:Impuestos>
  <cfdi:Complemento>
    <tfd:TimbreFiscalDigital xmlns:tfd="http://www.sat.gob.mx/TimbreFiscalDigital" xsi:schemaLocation="http://www.sat.gob.mx/TimbreFiscalDigital http://www.sat.gob.mx/sitio_internet/cfd/TimbreFiscalDigital/TimbreFiscalDigitalv11.xsd" Version="1.1" UUID="[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}" FechaTimbrado="2025-01-01T17:51:34" RfcProvCertif="SAT[A-Z0-9]{8}" SelloCFD="[alfanumeric_values]" NoCertificadoSAT="[0-9]{20}" SelloSAT="[alfanumeric_values]" />
  </cfdi:Complemento>
</cfdi:Comprobante>

I want to obtain the values in UUID attribute that looks like [A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}.

Share Improve this question edited yesterday Rafa Barragan asked yesterday Rafa BarraganRafa Barragan 6209 silver badges25 bronze badges 2
  • While asking a question, you need to provide a minimal reproducible example: Please edit your original question and provide the following: (1) Well-formed XML file sample with all relevant namespaces. (2) What you need to do, i.e. logic, and your code attempt trying to implement it. (3) Desired output based on the sample data in #1 above. – Yitzhak Khabinsky Commented yesterday
  • I have added the full xml, the output needed is mentioned at the end of the question – Rafa Barragan Commented yesterday
Add a comment  | 

3 Answers 3

Reset to default 3

There are two point to consider to parse this kind of xml:

  1. Complete namespace definition
  2. Use find() if you are sure the element is a direct child. Use xpath() if the element is nested or namespaces are dynamic.
from lxml import etree

xmlFile = "ex.xml"
xDoc = etree.parse(xmlFile).getroot()
ns = xDoc.nsmap.copy()

# Find and add the tfd namespace
for elem in xDoc.iter():
    if elem.tag.startswith("{"):
        uri = elem.tag.split("}")[0].strip("{")
        if "TimbreFiscalDigital" in elem.tag and uri not in ns.values():
            ns["tfd"] = uri
            
# print("Extracted Namespaces:", ns) 

# Use XPath instead of find()
folioFiscal = xDoc.xpath("//tfd:TimbreFiscalDigital", namespaces=ns)

if folioFiscal:
    uuid = folioFiscal[0].get("UUID")
    print("UUID:", uuid)
else:
    print("TimbreFiscalDigital not found!")

This one is a bit easier than what @Hermann12 already kindly pointed out.

When working with XML files that use namespaces, you need to tell lxml about those namespaces so it can correctly find the elements.

In your case, the <tfd:TimbreFiscalDigital> element is in the namespace http://www.sat.gob.mx/TimbreFiscalDigital. You can modify your code to include a namespace mapping and then search using an XPath expression. Try this:

from lxml import etree

# Parse the XML file
doc = etree.parse('xmlFile.xml')

# Define namespace mapping for the tfd prefix
namespaces = {'tfd': 'http://www.sat.gob.mx/TimbreFiscalDigital'}

# Use XPath to find the TimbreFiscalDigital element
timbre = doc.find('.//tfd:TimbreFiscalDigital', namespaces=namespaces)

if timbre is not None:
    # Get the UUID attribute
    uuid = timbre.get('UUID')
    print("UUID:", uuid)

This is probably not the best way to do this but it does work for the data as given in the question.

Just iterate over all elements of the document and examine the tag name but, effectively, ignore the namespace.

Like this:

from lxml import etree
import re

PATTERN = repile(r"^{.*}(.*)$")
FILENAME = "foo.xml"

doc = etree.parse(FILENAME)

for e in doc.iter():
    m = PATTERN.match(e.tag)
    if m and m.group(1) == "TimbreFiscalDigital":
        print(e.get("UUID"))
        break

Output:

[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}
发布评论

评论列表(0)

  1. 暂无评论