最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

docx - How to preserve text styles (bolditalic) and extract footnotes from a Word document using Python? - Stack Overflow

programmeradmin0浏览0评论

I’m working on a Python script to extract content from a Word document (.docx) and insert it into a SQL Server database. The challenge is that I need to preserve text styles like bold and italic, as well as handle line breaks and footnotes from the Word document.

Currently, I'm using the python-docx library to process the document. Line breaks have been successfully transferred using <br\>, but text styles (bold/italic) and footnotes are not being included in the output.

Here's what I’ve attempted so far:

1. For text styles:

I tried looping through paragraph.runs to detect run.bold and run.italic. However, the styled text doesn’t appear in my database output.

2. For footnotes:

I tried extracting footnotes using a custom function with doc.footnotes or checking for the style Footnote Text. While the function doesn’t raise errors, footnotes don’t appear in the final output.

Here’s the snippet of my code for processing styles and footnotes:

text_with_style = []
if paragraph.runs:
    for run in paragraph.runs:
        styled_text = run.text.strip()
        if run.bold:
            styled_text = f"<b>{styled_text}</b>"
        if run.italic:
            styled_text = f"<i>{styled_text}</i>"
        text_with_style.append(styled_text)

formatted_text = " ".join(text_with_style).replace("\n", "<br>")

For footnotes:


def extract_footnotes(doc):

    footnotes_text = []
    
    if hasattr(doc, 'footnotes'):
    
        for footnote in doc.footnotes:
    
            footnotes_text.append(footnote.text.strip())
    
    return footnotes_text

What am I missing? How can I reliably preserve bold/italic styles and extract footnotes so they’re included in the output that gets inserted into SQL Server? Any advice or working examples would be greatly appreciated.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论