你的位置：首页>programmer>python - Why validate the `href` attribute twice? - Stack Overflow

python - Why validate the `href` attribute twice? - Stack Overflow

programmeradmin2025-03-1615浏览0评论

I found the following web scraping code in Web Scraping with Python by Ryan Mitchel:

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re 
pages = set() 
def getLinks(pageUrl): 
    global pages 
    html = urlopen(";+pageUrl) 
    bsObj = BeautifulSoup(html) 
    for link in bsObj.findAll("a", href=repile("^(/wiki/)")): 
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages: 
            #find new page
                newPage = link.attrs['href'] 
                print(newPage) 
                pages.add(newPage) 
                getLinks(newPage) 
getLinks("")

I believe that in the findAll() for loop, all tag objects with href attributes that meet the criteria have already been retrieved. Why do we still need to check if the object has the href attribute afterward?

In my opinion, I think that this line code should be deleted: if 'href' in link.attrs: Do I think correctly?

与本文相关的文章

python - Why validate the `href` attribute twice? - Stack Overflow

评论列表(0)

暂无评论

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

与本文相关的文章

评论列表(0)