最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Why validate the `href` attribute twice? - Stack Overflow

programmeradmin0浏览0评论

I found the following web scraping code in Web Scraping with Python by Ryan Mitchel:

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re 
pages = set() 
def getLinks(pageUrl): 
    global pages 
    html = urlopen(";+pageUrl) 
    bsObj = BeautifulSoup(html) 
    for link in bsObj.findAll("a", href=repile("^(/wiki/)")): 
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages: 
            #find new page
                newPage = link.attrs['href'] 
                print(newPage) 
                pages.add(newPage) 
                getLinks(newPage) 
getLinks("")

I believe that in the findAll() for loop, all tag objects with href attributes that meet the criteria have already been retrieved. Why do we still need to check if the object has the href attribute afterward?

In my opinion, I think that this line code should be deleted: if 'href' in link.attrs: Do I think correctly?

发布评论

评论列表(0)

  1. 暂无评论